Skip to content

Polly

🧠 What is Amazon Polly?

Amazon Polly is a fully managed Text-to-Speech (TTS) service that turns text into lifelike spoken audio using advanced deep learning models.

βœ… It supports natural-sounding neural voices, speech marks, speech customization, and multi-language support β€” ideal for building voice-enabled applications.


🎯 Common Use Cases

Use Case Description
πŸŽ™οΈ Voice Assistants Read back responses in apps (Alexa, chatbots)
πŸ“– Audiobook Generation Convert articles/books to audio
πŸ›οΈ eCommerce Product info narration for visually impaired
πŸ“š eLearning Narrate training or course content
πŸ“² Mobile Apps Add speech functionality (news, weather, etc.)
πŸ” IVR Systems (Call Centers) Play back dynamic speech in phone calls

πŸ”Š Voice Types

Type Description
Standard Voices Traditional TTS using concatenative synthesis
Neural Voices Deep-learning based, more natural sounding
Newscaster Style Voice mimics a news reader (neural only)
Conversational Tones Neural with dynamic pauses and emphasis

🧠 Neural voices are 20–50% more human-like, but cost more than standard.


🌐 Supported Languages & Voices

  • 60+ languages and dialects

  • 100+ voices, including:

    • English (US, UK, IN, AU, CA)

    • Hindi, Tamil, Malayalam

    • Spanish, French, German, Italian

    • Japanese, Chinese, Korean, Arabic

βœ… You can select male/female, neural/standard, and language-specific variants.


🧰 Key Features

Feature Description
SSML support Add pauses, emphasis, pitch, rate, and other controls
Speech marks Get timestamps for word/phrase for subtitle sync
Lexicon support Customize pronunciation of names, acronyms, etc.
Neural voices High-quality, natural-sounding speech
S3 integration Save synthesized speech directly to a bucket
Real-time streaming Stream speech output as it's generated

πŸ”§ Sample Python Code (Boto3)

πŸ”ˆ Convert Text to Audio (MP3)

import boto3

polly = boto3.client("polly")

response = polly.synthesize_speech(
    Text="Hello, this is Amazon Polly speaking!",
    OutputFormat="mp3",
    VoiceId="Joanna"
)

with open("speech.mp3", "wb") as file:
    file.write(response["AudioStream"].read())

πŸ”  Use SSML (Speech Synthesis Markup Language)

response = polly.synthesize_speech(
    TextType="ssml",
    Text="""
        <speak>
            Welcome to <emphasis level="strong">Amazon Polly</emphasis>.
            <break time="500ms"/> I can read text naturally.
        </speak>
    """,
    OutputFormat="mp3",
    VoiceId="Matthew"
)

πŸͺ™ Pricing (2024)

Tier Price
Standard voice $4.00 per 1 million characters
Neural voice $16.00 per 1 million characters
Free Tier 5 million characters/month (standard) for 12 months

🧠 You are billed per character, not per audio file size.


πŸŽ›οΈ Terraform Setup for IAM Role (used with Lambda or Polly)

While Amazon Polly itself doesn’t need to be provisioned (it’s an API), you’ll likely use it with other services like Lambda.

resource "aws_iam_role" "lambda_polly_role" {
  name = "lambda-polly-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Principal = {
        Service = "lambda.amazonaws.com"
      },
      Action = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_policy" "polly_access" {
  name = "PollyAccess"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "polly:SynthesizeSpeech"
        ],
        Resource = "*"
      },
      {
        Effect = "Allow",
        Action = [
          "logs:*"
        ],
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "attach" {
  role       = aws_iam_role.lambda_polly_role.name
  policy_arn = aws_iam_policy.polly_access.arn
}

πŸ“¦ Polly with Other AWS Services

AWS Service Integration
Lambda Call Polly to generate voice dynamically
S3 Store speech output
CloudFront Serve MP3s as public URLs
Lex / Connect Use Polly voices for conversational bots
IoT Core Voice notifications to devices
Step Functions Build audio generation pipelines

πŸ” Security & Access Control

  • IAM policies control access to polly:SynthesizeSpeech

  • Can be called from:

    • Lambda

    • EC2/Containers

    • Front-end via API Gateway (via signed token)


βœ… TL;DR Summary

Feature Amazon Polly
Speech Synthesis βœ… Yes
Neural Voices βœ… Yes (High quality)
Real-time Streaming βœ… Yes
Language Support βœ… 60+
SSML Customization βœ… Yes (with pitch, pause, emphasis)
Cost Per million characters
Free Tier βœ… 5M standard characters/month (1st year)
Use Cases Voice bots, subtitles, training audio, IVRs