Skip to content

Transcribe

🧠 What is Amazon Transcribe?

Amazon Transcribe is a fully managed automatic speech recognition (ASR) service that enables developers to convert speech into text quickly and at scale.

βœ… It supports real-time streaming and batch transcription for audio/video files in multiple languages, with features like speaker identification, custom vocabulary, punctuation, and formatting.


πŸ“¦ Key Use Cases

Use Case Description
πŸŽ™οΈ Voice-to-text transcription Convert audio files (e.g., MP3, WAV) to readable text
πŸ§‘β€βš–οΈ Call center analytics Transcribe customer-agent calls for compliance/QA
πŸ“‹ Subtitles & captions Generate captions for videos, podcasts, or broadcasts
🌐 Multi-language transcription Support for 100+ languages and dialects
🧠 NLP/ML post-processing Use transcripts for summarization, sentiment, etc.
🧾 Meeting transcription Transcribe Zoom/Teams/Google Meet recordings

🧰 Types of Transcribe Jobs

Type Use Case API
Batch Long-form transcription StartTranscriptionJob
Streaming Real-time voice input WebSocket / HTTP/2
Medical Medical domain conversations Amazon Transcribe Medical
Call Analytics Agent/customer analytics Speaker turns + sentiment

🌐 Supported Languages (2024)

Amazon Transcribe supports 100+ languages, including:

  • English (US, UK, IN, AU)

  • Hindi, Tamil, Telugu

  • Spanish, French, German

  • Japanese, Korean, Mandarin

  • Arabic, Farsi, Russian, and many more.


🎯 Key Features

Feature Description
Speaker identification Identify β€œSpeaker 1”, β€œSpeaker 2”, etc. in multi-person audio
Custom vocabulary Add uncommon terms, brand names, acronyms
Channel identification Separate audio channels (e.g., agent vs customer)
Word-level timestamps Start/end time for each word
Automatic punctuation Commas, periods, question marks automatically added
Custom language models Optional advanced tuning (developer preview)
Content redaction Mask or remove sensitive data (PII)

πŸ—οΈ Architecture Overview

               +------------------+
               |    Audio File    | (.mp3, .mp4, .wav, .flac)
               +--------+---------+
                        |
              +-------------------+
              |  Amazon Transcribe|
              +--------+----------+
                       |
               +------------------+
               |   Transcription  |
               |   JSON/Plaintext |
               +------------------+

You can optionally send the transcript to:

  • S3

  • Lambda

  • SNS/SQS

  • Amazon Comprehend (for NLP)


πŸ’‘ Input/Output Formats

Input Format Description
.mp3, .mp4, .wav, .flac Most common audio/video types
Sampling rate Must be 8kHz or 16kHz (depends on use case)
Output JSON (structured) and/or plain text

πŸ’Έ Pricing Overview (2024)

Type Price per minute
Standard ~$0.024 per audio minute
Medical ~$0.036 per audio minute
Streaming ~$0.0004 per second
Custom Vocabulary Free
Storage (S3) Charged separately if stored

🧠 Free Tier: 60 minutes/month for 12 months.


πŸ”§ Sample Python Code (Boto3) β€” Batch

import boto3

transcribe = boto3.client('transcribe')

response = transcribe.start_transcription_job(
    TranscriptionJobName='MyJob',
    Media={'MediaFileUri': 's3://my-bucket/audio.mp3'},
    MediaFormat='mp3',
    LanguageCode='en-US',
    OutputBucketName='my-output-bucket'
)

print("Started:", response['TranscriptionJob']['TranscriptionJobName'])

Example JSON Output

{
  "jobName": "MyJob",
  "results": {
    "transcripts": [
      {"transcript": "Hello, welcome to our customer service hotline."}
    ],
    "items": [
      {
        "start_time": "0.54",
        "end_time": "0.96",
        "alternatives": [{"confidence": "1.0", "content": "Hello"}],
        "type": "pronunciation"
      }
    ]
  }
}

πŸ“‘ Real-time Streaming (WebSocket)

Amazon Transcribe supports WebSocket-based real-time transcription (e.g., for call center apps, live captioning, etc.).

You need:

  • Audio in PCM 16-bit linear format

  • WebSocket client (or SDK like AWS Chime SDK or Transcribe Streaming SDK)


🩺 Amazon Transcribe Medical

Feature Description
Domain β€œPRIMARYCARE” or β€œCARDIOLOGY”, etc.
HIPAA eligible βœ… Yes (must sign BAA)
Use case Medical dictation, doctor-patient dialogue
Format .wav, .mp3, .flac, etc.
Output Medical JSON (with PII redacted optionally)

βš™οΈ Terraform for IAM Setup (Transcribe Batch)

resource "aws_iam_role" "transcribe_role" {
  name = "transcribe-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Principal = {
        Service = "transcribe.amazonaws.com"
      },
      Action = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "transcribe_policy" {
  name = "allow-transcribe-access"
  role = aws_iam_role.transcribe_role.id

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Action = [
        "s3:GetObject",
        "s3:PutObject"
      ],
      Resource = "arn:aws:s3:::your-bucket-name/*"
    }]
  })
}

🧠 Note: No direct Terraform support for starting jobs β€” use Lambda or Boto3 SDK.


πŸ” Security & Compliance

Feature Description
IAM Policies Secure access to S3, Transcribe APIs
KMS Encryption For S3 buckets and results
HIPAA βœ… Supported (Transcribe Medical)
Audit Trails βœ… CloudTrail supported

βœ… TL;DR Summary

Feature Amazon Transcribe
Batch transcription βœ… Yes
Real-time transcription βœ… Yes (streaming)
Speaker identification βœ… Yes
Custom vocabulary βœ… Yes
Medical transcription βœ… Yes (with HIPAA support)
Languages supported βœ… 100+
Terraform support ⚠️ Only IAM & S3 setup, no job creation
Common integrations S3, Lambda, Comprehend, SNS, Step Functions