Transcribe

🧠 What is Amazon Transcribe?¶

Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that enables developers to convert speech into text quickly and at scale.

✅ It supports real-time streaming and batch transcription for audio/video files in multiple languages, with features like speaker identification, custom vocabulary, punctuation, and formatting.

📦 Key Use Cases¶

Use Case	Description
🎙️ Voice-to-text transcription	Convert audio files (e.g., MP3, WAV) to readable text
🧑‍⚖️ Call center analytics	Transcribe customer-agent calls for compliance/QA
📋 Subtitles & captions	Generate captions for videos, podcasts, or broadcasts
🌐 Multi-language transcription	Support for 100+ languages and dialects
🧠 NLP/ML post-processing	Use transcripts for summarization, sentiment, etc.
🧾 Meeting transcription	Transcribe Zoom/Teams/Google Meet recordings

🧰 Types of Transcribe Jobs¶

Type	Use Case	API
Batch	Long-form transcription	`StartTranscriptionJob`
Streaming	Real-time voice input	WebSocket / HTTP/2
Medical	Medical domain conversations	Amazon Transcribe Medical
Call Analytics	Agent/customer analytics	Speaker turns + sentiment

🌐 Supported Languages (2024)¶

Amazon Transcribe supports 100+ languages, including:

English (US, UK, IN, AU)
Hindi, Tamil, Telugu
Spanish, French, German
Japanese, Korean, Mandarin
Arabic, Farsi, Russian, and many more.

🎯 Key Features¶

Feature	Description
Speaker identification	Identify “Speaker 1”, “Speaker 2”, etc. in multi-person audio
Custom vocabulary	Add uncommon terms, brand names, acronyms
Channel identification	Separate audio channels (e.g., agent vs customer)
Word-level timestamps	Start/end time for each word
Automatic punctuation	Commas, periods, question marks automatically added
Custom language models	Optional advanced tuning (developer preview)
Content redaction	Mask or remove sensitive data (PII)

🏗️ Architecture Overview¶

               +------------------+
               |    Audio File    | (.mp3, .mp4, .wav, .flac)
               +--------+---------+
                        |
              +-------------------+
              |  Amazon Transcribe|
              +--------+----------+
                       |
               +------------------+
               |   Transcription  |
               |   JSON/Plaintext |
               +------------------+

You can optionally send the transcript to:

S3
Lambda
SNS/SQS
Amazon Comprehend (for NLP)

💡 Input/Output Formats¶

Input Format	Description
`.mp3`, `.mp4`, `.wav`, `.flac`	Most common audio/video types
Sampling rate	Must be 8kHz or 16kHz (depends on use case)
Output	JSON (structured) and/or plain text

💸 Pricing Overview (2024)¶

Type	Price per minute
Standard	~$0.024 per audio minute
Medical	~$0.036 per audio minute
Streaming	~$0.0004 per second
Custom Vocabulary	Free
Storage (S3)	Charged separately if stored

🧠 Free Tier: 60 minutes/month for 12 months.

🔧 Sample Python Code (Boto3) — Batch¶

import boto3

transcribe = boto3.client('transcribe')

response = transcribe.start_transcription_job(
    TranscriptionJobName='MyJob',
    Media={'MediaFileUri': 's3://my-bucket/audio.mp3'},
    MediaFormat='mp3',
    LanguageCode='en-US',
    OutputBucketName='my-output-bucket'
)

print("Started:", response['TranscriptionJob']['TranscriptionJobName'])

Example JSON Output¶

{
  "jobName": "MyJob",
  "results": {
    "transcripts": [
      {"transcript": "Hello, welcome to our customer service hotline."}
    ],
    "items": [
      {
        "start_time": "0.54",
        "end_time": "0.96",
        "alternatives": [{"confidence": "1.0", "content": "Hello"}],
        "type": "pronunciation"
      }
    ]
  }
}

📡 Real-time Streaming (WebSocket)¶

Amazon Transcribe supports WebSocket-based real-time transcription (e.g., for call center apps, live captioning, etc.).

You need:

Audio in PCM 16-bit linear format
WebSocket client (or SDK like AWS Chime SDK or Transcribe Streaming SDK)

🩺 Amazon Transcribe Medical¶

Feature	Description
Domain	“PRIMARYCARE” or “CARDIOLOGY”, etc.
HIPAA eligible	✅ Yes (must sign BAA)
Use case	Medical dictation, doctor-patient dialogue
Format	`.wav`, `.mp3`, `.flac`, etc.
Output	Medical JSON (with PII redacted optionally)

⚙️ Terraform for IAM Setup (Transcribe Batch)¶

resource "aws_iam_role" "transcribe_role" {
  name = "transcribe-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Principal = {
        Service = "transcribe.amazonaws.com"
      },
      Action = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "transcribe_policy" {
  name = "allow-transcribe-access"
  role = aws_iam_role.transcribe_role.id

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Action = [
        "s3:GetObject",
        "s3:PutObject"
      ],
      Resource = "arn:aws:s3:::your-bucket-name/*"
    }]
  })
}

🧠 Note: No direct Terraform support for starting jobs — use Lambda or Boto3 SDK.

🔐 Security & Compliance¶

Feature	Description
IAM Policies	Secure access to S3, Transcribe APIs
KMS Encryption	For S3 buckets and results
HIPAA	✅ Supported (Transcribe Medical)
Audit Trails	✅ CloudTrail supported

✅ TL;DR Summary¶

Feature	Amazon Transcribe
Batch transcription	✅ Yes
Real-time transcription	✅ Yes (streaming)
Speaker identification	✅ Yes
Custom vocabulary	✅ Yes
Medical transcription	✅ Yes (with HIPAA support)
Languages supported	✅ 100+
Terraform support	⚠️ Only IAM & S3 setup, no job creation
Common integrations	S3, Lambda, Comprehend, SNS, Step Functions