Polly
π§ What is Amazon Polly?¶
Amazon Polly is a fully managed Text-to-Speech (TTS) service that turns text into lifelike spoken audio using advanced deep learning models.
β It supports natural-sounding neural voices, speech marks, speech customization, and multi-language support β ideal for building voice-enabled applications.
π― Common Use Cases¶
| Use Case | Description |
|---|---|
| ποΈ Voice Assistants | Read back responses in apps (Alexa, chatbots) |
| π Audiobook Generation | Convert articles/books to audio |
| ποΈ eCommerce | Product info narration for visually impaired |
| π eLearning | Narrate training or course content |
| π² Mobile Apps | Add speech functionality (news, weather, etc.) |
| π IVR Systems (Call Centers) | Play back dynamic speech in phone calls |
π Voice Types¶
| Type | Description |
|---|---|
| Standard Voices | Traditional TTS using concatenative synthesis |
| Neural Voices | Deep-learning based, more natural sounding |
| Newscaster Style | Voice mimics a news reader (neural only) |
| Conversational Tones | Neural with dynamic pauses and emphasis |
π§ Neural voices are 20β50% more human-like, but cost more than standard.
π Supported Languages & Voices¶
-
60+ languages and dialects
-
100+ voices, including:
-
English (US, UK, IN, AU, CA)
-
Hindi, Tamil, Malayalam
-
Spanish, French, German, Italian
-
Japanese, Chinese, Korean, Arabic
-
β You can select male/female, neural/standard, and language-specific variants.
π§° Key Features¶
| Feature | Description |
|---|---|
| SSML support | Add pauses, emphasis, pitch, rate, and other controls |
| Speech marks | Get timestamps for word/phrase for subtitle sync |
| Lexicon support | Customize pronunciation of names, acronyms, etc. |
| Neural voices | High-quality, natural-sounding speech |
| S3 integration | Save synthesized speech directly to a bucket |
| Real-time streaming | Stream speech output as it's generated |
π§ Sample Python Code (Boto3)¶
π Convert Text to Audio (MP3)¶
import boto3
polly = boto3.client("polly")
response = polly.synthesize_speech(
Text="Hello, this is Amazon Polly speaking!",
OutputFormat="mp3",
VoiceId="Joanna"
)
with open("speech.mp3", "wb") as file:
file.write(response["AudioStream"].read())
π Use SSML (Speech Synthesis Markup Language)¶
response = polly.synthesize_speech(
TextType="ssml",
Text="""
<speak>
Welcome to <emphasis level="strong">Amazon Polly</emphasis>.
<break time="500ms"/> I can read text naturally.
</speak>
""",
OutputFormat="mp3",
VoiceId="Matthew"
)
πͺ Pricing (2024)¶
| Tier | Price |
|---|---|
| Standard voice | $4.00 per 1 million characters |
| Neural voice | $16.00 per 1 million characters |
| Free Tier | 5 million characters/month (standard) for 12 months |
π§ You are billed per character, not per audio file size.
ποΈ Terraform Setup for IAM Role (used with Lambda or Polly)¶
While Amazon Polly itself doesnβt need to be provisioned (itβs an API), youβll likely use it with other services like Lambda.
resource "aws_iam_role" "lambda_polly_role" {
name = "lambda-polly-role"
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement = [{
Effect = "Allow",
Principal = {
Service = "lambda.amazonaws.com"
},
Action = "sts:AssumeRole"
}]
})
}
resource "aws_iam_policy" "polly_access" {
name = "PollyAccess"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"polly:SynthesizeSpeech"
],
Resource = "*"
},
{
Effect = "Allow",
Action = [
"logs:*"
],
Resource = "*"
}
]
})
}
resource "aws_iam_role_policy_attachment" "attach" {
role = aws_iam_role.lambda_polly_role.name
policy_arn = aws_iam_policy.polly_access.arn
}
π¦ Polly with Other AWS Services¶
| AWS Service | Integration |
|---|---|
| Lambda | Call Polly to generate voice dynamically |
| S3 | Store speech output |
| CloudFront | Serve MP3s as public URLs |
| Lex / Connect | Use Polly voices for conversational bots |
| IoT Core | Voice notifications to devices |
| Step Functions | Build audio generation pipelines |
π Security & Access Control¶
-
IAM policies control access to
polly:SynthesizeSpeech -
Can be called from:
-
Lambda
-
EC2/Containers
-
Front-end via API Gateway (via signed token)
-
β TL;DR Summary¶
| Feature | Amazon Polly |
|---|---|
| Speech Synthesis | β Yes |
| Neural Voices | β Yes (High quality) |
| Real-time Streaming | β Yes |
| Language Support | β 60+ |
| SSML Customization | β Yes (with pitch, pause, emphasis) |
| Cost | Per million characters |
| Free Tier | β 5M standard characters/month (1st year) |
| Use Cases | Voice bots, subtitles, training audio, IVRs |