Transcribe
π§ What is Amazon Transcribe?¶
Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that enables developers to convert speech into text quickly and at scale.
β It supports real-time streaming and batch transcription for audio/video files in multiple languages, with features like speaker identification, custom vocabulary, punctuation, and formatting.
π¦ Key Use Cases¶
| Use Case | Description |
|---|---|
| ποΈ Voice-to-text transcription | Convert audio files (e.g., MP3, WAV) to readable text |
| π§ββοΈ Call center analytics | Transcribe customer-agent calls for compliance/QA |
| π Subtitles & captions | Generate captions for videos, podcasts, or broadcasts |
| π Multi-language transcription | Support for 100+ languages and dialects |
| π§ NLP/ML post-processing | Use transcripts for summarization, sentiment, etc. |
| π§Ύ Meeting transcription | Transcribe Zoom/Teams/Google Meet recordings |
π§° Types of Transcribe Jobs¶
| Type | Use Case | API |
|---|---|---|
| Batch | Long-form transcription | StartTranscriptionJob |
| Streaming | Real-time voice input | WebSocket / HTTP/2 |
| Medical | Medical domain conversations | Amazon Transcribe Medical |
| Call Analytics | Agent/customer analytics | Speaker turns + sentiment |
π Supported Languages (2024)¶
Amazon Transcribe supports 100+ languages, including:
-
English (US, UK, IN, AU)
-
Hindi, Tamil, Telugu
-
Spanish, French, German
-
Japanese, Korean, Mandarin
-
Arabic, Farsi, Russian, and many more.
π― Key Features¶
| Feature | Description |
|---|---|
| Speaker identification | Identify βSpeaker 1β, βSpeaker 2β, etc. in multi-person audio |
| Custom vocabulary | Add uncommon terms, brand names, acronyms |
| Channel identification | Separate audio channels (e.g., agent vs customer) |
| Word-level timestamps | Start/end time for each word |
| Automatic punctuation | Commas, periods, question marks automatically added |
| Custom language models | Optional advanced tuning (developer preview) |
| Content redaction | Mask or remove sensitive data (PII) |
ποΈ Architecture Overview¶
+------------------+
| Audio File | (.mp3, .mp4, .wav, .flac)
+--------+---------+
|
+-------------------+
| Amazon Transcribe|
+--------+----------+
|
+------------------+
| Transcription |
| JSON/Plaintext |
+------------------+
You can optionally send the transcript to:
-
S3
-
Lambda
-
SNS/SQS
-
Amazon Comprehend (for NLP)
π‘ Input/Output Formats¶
| Input Format | Description |
|---|---|
.mp3, .mp4, .wav, .flac |
Most common audio/video types |
| Sampling rate | Must be 8kHz or 16kHz (depends on use case) |
| Output | JSON (structured) and/or plain text |
πΈ Pricing Overview (2024)¶
| Type | Price per minute |
|---|---|
| Standard | ~$0.024 per audio minute |
| Medical | ~$0.036 per audio minute |
| Streaming | ~$0.0004 per second |
| Custom Vocabulary | Free |
| Storage (S3) | Charged separately if stored |
π§ Free Tier: 60 minutes/month for 12 months.
π§ Sample Python Code (Boto3) β Batch¶
import boto3
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
TranscriptionJobName='MyJob',
Media={'MediaFileUri': 's3://my-bucket/audio.mp3'},
MediaFormat='mp3',
LanguageCode='en-US',
OutputBucketName='my-output-bucket'
)
print("Started:", response['TranscriptionJob']['TranscriptionJobName'])
Example JSON Output¶
{
"jobName": "MyJob",
"results": {
"transcripts": [
{"transcript": "Hello, welcome to our customer service hotline."}
],
"items": [
{
"start_time": "0.54",
"end_time": "0.96",
"alternatives": [{"confidence": "1.0", "content": "Hello"}],
"type": "pronunciation"
}
]
}
}
π‘ Real-time Streaming (WebSocket)¶
Amazon Transcribe supports WebSocket-based real-time transcription (e.g., for call center apps, live captioning, etc.).
You need:
-
Audio in PCM 16-bit linear format
-
WebSocket client (or SDK like AWS Chime SDK or Transcribe Streaming SDK)
π©Ί Amazon Transcribe Medical¶
| Feature | Description |
|---|---|
| Domain | βPRIMARYCAREβ or βCARDIOLOGYβ, etc. |
| HIPAA eligible | β Yes (must sign BAA) |
| Use case | Medical dictation, doctor-patient dialogue |
| Format | .wav, .mp3, .flac, etc. |
| Output | Medical JSON (with PII redacted optionally) |
βοΈ Terraform for IAM Setup (Transcribe Batch)¶
resource "aws_iam_role" "transcribe_role" {
name = "transcribe-role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Principal = {
Service = "transcribe.amazonaws.com"
},
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_role_policy" "transcribe_policy" {
name = "allow-transcribe-access"
role = aws_iam_role.transcribe_role.id
policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Action = [
"s3:GetObject",
"s3:PutObject"
],
Resource = "arn:aws:s3:::your-bucket-name/*"
}]
})
}
π§ Note: No direct Terraform support for starting jobs β use Lambda or Boto3 SDK.
π Security & Compliance¶
| Feature | Description |
|---|---|
| IAM Policies | Secure access to S3, Transcribe APIs |
| KMS Encryption | For S3 buckets and results |
| HIPAA | β Supported (Transcribe Medical) |
| Audit Trails | β CloudTrail supported |
β TL;DR Summary¶
| Feature | Amazon Transcribe |
|---|---|
| Batch transcription | β Yes |
| Real-time transcription | β Yes (streaming) |
| Speaker identification | β Yes |
| Custom vocabulary | β Yes |
| Medical transcription | β Yes (with HIPAA support) |
| Languages supported | β 100+ |
| Terraform support | β οΈ Only IAM & S3 setup, no job creation |
| Common integrations | S3, Lambda, Comprehend, SNS, Step Functions |