Polly

🧠 What is Amazon Polly?¶

Amazon Polly is a fully managed Text-to-Speech (TTS) service that turns text into lifelike spoken audio using advanced deep learning models.

✅ It supports natural-sounding neural voices, speech marks, speech customization, and multi-language support — ideal for building voice-enabled applications.

🎯 Common Use Cases¶

Use Case	Description
🎙️ Voice Assistants	Read back responses in apps (Alexa, chatbots)
📖 Audiobook Generation	Convert articles/books to audio
🛍️ eCommerce	Product info narration for visually impaired
📚 eLearning	Narrate training or course content
📲 Mobile Apps	Add speech functionality (news, weather, etc.)
🔁 IVR Systems (Call Centers)	Play back dynamic speech in phone calls

🔊 Voice Types¶

Type	Description
Standard Voices	Traditional TTS using concatenative synthesis
Neural Voices	Deep-learning based, more natural sounding
Newscaster Style	Voice mimics a news reader (neural only)
Conversational Tones	Neural with dynamic pauses and emphasis

🧠 Neural voices are 20–50% more human-like, but cost more than standard.

🌐 Supported Languages & Voices¶

60+ languages and dialects
100+ voices, including:
- English (US, UK, IN, AU, CA)
- Hindi, Tamil, Malayalam
- Spanish, French, German, Italian
- Japanese, Chinese, Korean, Arabic

✅ You can select male/female, neural/standard, and language-specific variants.

🧰 Key Features¶

Feature	Description
SSML support	Add pauses, emphasis, pitch, rate, and other controls
Speech marks	Get timestamps for word/phrase for subtitle sync
Lexicon support	Customize pronunciation of names, acronyms, etc.
Neural voices	High-quality, natural-sounding speech
S3 integration	Save synthesized speech directly to a bucket
Real-time streaming	Stream speech output as it's generated

🔧 Sample Python Code (Boto3)¶

🔈 Convert Text to Audio (MP3)¶

import boto3

polly = boto3.client("polly")

response = polly.synthesize_speech(
    Text="Hello, this is Amazon Polly speaking!",
    OutputFormat="mp3",
    VoiceId="Joanna"
)

with open("speech.mp3", "wb") as file:
    file.write(response["AudioStream"].read())

🔠 Use SSML (Speech Synthesis Markup Language)¶

response = polly.synthesize_speech(
    TextType="ssml",
    Text="""
        <speak>
            Welcome to <emphasis level="strong">Amazon Polly</emphasis>.
            <break time="500ms"/> I can read text naturally.
        </speak>
    """,
    OutputFormat="mp3",
    VoiceId="Matthew"
)

🪙 Pricing (2024)¶

Tier	Price
Standard voice	$4.00 per 1 million characters
Neural voice	$16.00 per 1 million characters
Free Tier	5 million characters/month (standard) for 12 months

🧠 You are billed per character, not per audio file size.

🎛️ Terraform Setup for IAM Role (used with Lambda or Polly)¶

While Amazon Polly itself doesn’t need to be provisioned (it’s an API), you’ll likely use it with other services like Lambda.

resource "aws_iam_role" "lambda_polly_role" {
  name = "lambda-polly-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [{
      Effect = "Allow",
      Principal = {
        Service = "lambda.amazonaws.com"
      },
      Action = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_policy" "polly_access" {
  name = "PollyAccess"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "polly:SynthesizeSpeech"
        ],
        Resource = "*"
      },
      {
        Effect = "Allow",
        Action = [
          "logs:*"
        ],
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "attach" {
  role       = aws_iam_role.lambda_polly_role.name
  policy_arn = aws_iam_policy.polly_access.arn
}

📦 Polly with Other AWS Services¶

AWS Service	Integration
Lambda	Call Polly to generate voice dynamically
S3	Store speech output
CloudFront	Serve MP3s as public URLs
Lex / Connect	Use Polly voices for conversational bots
IoT Core	Voice notifications to devices
Step Functions	Build audio generation pipelines

🔐 Security & Access Control¶

IAM policies control access to polly:SynthesizeSpeech
Can be called from:
- Lambda
- EC2/Containers
- Front-end via API Gateway (via signed token)

✅ TL;DR Summary¶

Feature	Amazon Polly
Speech Synthesis	✅ Yes
Neural Voices	✅ Yes (High quality)
Real-time Streaming	✅ Yes
Language Support	✅ 60+
SSML Customization	✅ Yes (with pitch, pause, emphasis)
Cost	Per million characters
Free Tier	✅ 5M standard characters/month (1st year)
Use Cases	Voice bots, subtitles, training audio, IVRs