Comprehend
π§ What is Amazon Comprehend?¶
Amazon Comprehend is a fully managed NLP service that uses machine learning to extract insights and relationships from text β no ML expertise required.
β It can detect sentiment, key phrases, named entities, language, and even supports custom classification and entity recognition.
π¦ Key Use Cases¶
| Use Case | Description |
|---|---|
| π§ Sentiment analysis | Understand tone in product reviews, emails, tweets, etc. |
| π·οΈ Entity extraction | Extract names, dates, organizations, locations |
| π Text classification | Categorize support tickets, feedback, news, etc. |
| π Key phrase extraction | Pull important terms from paragraphs |
| π Language detection | Identify the language of the given text |
| π§Ύ Custom classifiers | Train models to classify content in domain-specific ways |
| π Custom entity recognition | Detect domain-specific entities (e.g., product codes, IDs) |
| π₯ Medical text analysis | (Comprehend Medical) Extract protected health info (PHI) |
π Features Overview¶
| Feature | API / Feature Name | Description |
|---|---|---|
| Language detection | DetectDominantLanguage |
Auto-detects the language of the input text |
| Entity recognition | DetectEntities |
Extracts people, places, dates, organizations |
| Key phrase detection | DetectKeyPhrases |
Highlights important phrases |
| Sentiment analysis | DetectSentiment |
Positive, Negative, Neutral, Mixed |
| Syntax analysis | DetectSyntax |
Parts of speech tagging (noun, verb, etc.) |
| PII detection | ContainsPiiEntities |
Finds PII (email, SSN, phone, etc.) |
| Custom classification | CreateDocumentClassifier |
Train and use a classifier with your labeled dataset |
| Custom entity recognition | CreateEntityRecognizer |
Train model to detect domain-specific named entities |
| Comprehend Medical | comprehendmedical:* |
Specialized medical text analysis (HIPAA compliant) |
π Supported Languages (Standard Comprehend)¶
-
English, Spanish, French, German, Italian, Portuguese
-
Hindi (limited), Japanese, Korean, Chinese (Simplified)
-
Many APIs (like sentiment) support 6β10 languages
π‘ Example: Analyze a Product Review¶
Input:¶
βThe new headphones are amazing! Great sound and battery life.β
Output:¶
-
Sentiment:
Positive -
Entities:
headphones(Product) -
Key Phrases:
new headphones,great sound,battery life
π§ͺ Python (Boto3) Examples¶
π Detect Sentiment¶
import boto3
client = boto3.client('comprehend')
response = client.detect_sentiment(
Text="I love this product. It works flawlessly.",
LanguageCode='en'
)
print(response['Sentiment']) # POSITIVE
π Detect Key Phrases¶
response = client.detect_key_phrases(
Text="The movie had great visual effects and soundtrack.",
LanguageCode='en'
)
for phrase in response['KeyPhrases']:
print(phrase['Text'], " - Confidence:", phrase['Score'])
π·οΈ Detect Entities¶
response = client.detect_entities(
Text="Barack Obama was the 44th President of the United States.",
LanguageCode='en'
)
for entity in response['Entities']:
print(entity['Type'], entity['Text'])
π§βπ« Custom Classification (Advanced)¶
You can:
-
Train a custom classifier using labeled text documents in S3 (CSV format)
-
Create and train with
CreateDocumentClassifier -
Use
ClassifyDocumentAPI on new text
Dataset Format (CSV)¶
__label__billing The customer has a billing issue with their last invoice.
__label__support I can't connect to the internet since yesterday.
πΈ Pricing (2024)¶
| Feature | Price (per unit) |
|---|---|
| Text analysis (standard) | ~$1.00 per 1000 units (100 units = 100 tokens β 100 words) |
| Custom classification | ~$3.00 per hour (training) + usage fees |
| PII detection | ~$0.0001 per unit |
| Comprehend Medical | ~$0.0015 per unit |
| Free tier | 50K units/month for 12 months |
π‘οΈ Security & Compliance¶
| Feature | Support |
|---|---|
| IAM control | β Yes |
| Logging with CloudTrail | β Yes |
| HIPAA compliance | β Comprehend Medical |
| KMS integration | β For S3 training data |
| VPC endpoints | β Via AWS PrivateLink |
π§± Terraform Support¶
Comprehend support in Terraform is limited to custom classifier resources.
Custom Classifier Example:¶
resource "aws_comprehend_document_classifier" "my_classifier" {
document_classifier_name = "support-ticket-classifier"
data_access_role_arn = aws_iam_role.comprehend_access.arn
language_code = "en"
input_data_config {
s3_uri = "s3://my-dataset/classifier/train.csv"
data_format = "COMPREHEND_CSV"
}
}
π Service Integrations¶
| AWS Service | Integration Use Case |
|---|---|
| S3 | Store input documents or training datasets |
| Lambda | Analyze text as it is uploaded or submitted |
| SNS / SQS | Alert when classification is complete |
| Translate | Translate β Comprehend for multi-language support |
| QuickSight | Analyze sentiment trends or classification breakdowns |
| OpenSearch | Index structured text from Comprehend analysis |
β TL;DR Summary¶
| Feature | Amazon Comprehend |
|---|---|
| Ready-made NLP | β Yes |
| Language detection | β Yes |
| Sentiment & key phrases | β Yes |
| Entity recognition | β Yes (standard and custom) |
| PII detection | β Yes |
| Custom classification | β Yes (via S3 CSV) |
| Terraform support | β οΈ Partial (custom only) |
| Free tier | β 50K units/mo for 12 months |
| Common integration | Lambda, Translate, S3, QuickSight |