Kinesis Data Streams
๐ What is Amazon Kinesis Data Streams?¶
Amazon Kinesis Data Streams (KDS) is a real-time, serverless streaming service that enables you to ingest, buffer, and process high-throughput data streams in near real-time.
โ Best suited for real-time analytics, log ingestion, telemetry, clickstreams, and data pipeline ingestion.
๐ง Key Concepts¶
| Concept | Description |
|---|---|
| Stream | A logical entity that captures ordered data records |
| Shard | A unit of parallelism (1MB/s input, 2MB/s output, up to 1000 records/sec) |
| Record | Data blob (max 1 MB) + optional key + sequence number |
| Partition Key | Used to group and distribute records into shards |
| Consumer | Processes data from shards (e.g., Lambda, KCL app, Firehose) |
๐ฏ Use Cases¶
| Use Case | Description |
|---|---|
| ๐ Real-time log analytics | Ingest logs for real-time dashboards (CloudWatch, ELK, etc.) |
| ๐ Metrics pipeline | Collect app metrics and visualize instantly |
| ๐ Clickstream analysis | Track user activity on websites or apps |
| ๐งช Fraud detection | Analyze payment streams in real-time |
| โ๏ธ ETL staging | Buffer raw events before storing in S3 or Redshift |
๐ง How It Works¶
๐ฆ Record Structure¶
-
Max record size: 1 MB
-
Max retention: 7 days (default 24 hours)
-
Guaranteed ordering within partition key
โ๏ธ Consumer Options¶
| Consumer Type | Description | Throughput |
|---|---|---|
| Lambda | Event-based, simple integration | ~1000/sec/shard |
| KCL (Java, Python) | Client Library for heavy stream processing | Full shard capacity |
| Enhanced Fan-Out (EFO) | Dedicated 2 MB/sec pipe per consumer | High concurrency |
| Firehose | For automatic delivery to S3, Redshift, etc. | Managed |
๐ Security¶
| Layer | Options |
|---|---|
| IAM Policies | Fine-grained control over Kinesis |
| KMS | Encrypt streams at rest (SSE-KMS) |
| VPC Endpoints | Private connectivity |
| Access Control | Via policies or signed requests |
๐งฐ Terraform Example โ Kinesis + Lambda¶
1. Create Kinesis Stream¶
resource "aws_kinesis_stream" "demo" {
name = "demo-stream"
shard_count = 2
retention_period = 48
encryption_type = "KMS"
}
2. Attach Lambda as Consumer¶
resource "aws_lambda_event_source_mapping" "kinesis_lambda" {
event_source_arn = aws_kinesis_stream.demo.arn
function_name = aws_lambda_function.processor.arn
starting_position = "LATEST"
batch_size = 100
enabled = true
}
Note: Your Lambda must have
kinesis:DescribeStream,kinesis:GetRecords,kinesis:GetShardIterator, etc.
๐งช Enhanced Fan-Out Example (Optional)¶
resource "aws_kinesis_stream_consumer" "consumer" {
name = "efo-consumer"
stream_arn = aws_kinesis_stream.demo.arn
}
๐งฎ Performance Guidelines¶
| Best Practice | Why? |
|---|---|
| Use partition keys | To balance load across shards |
| Use Enhanced Fan-Out for high concurrency | Avoids throttling shared by consumers |
| Tune shard count | Based on incoming rate (1MB/sec/shard) |
| Use batching with Lambda | Reduces cost and improves throughput |
๐ Monitoring with CloudWatch¶
| Metric | Description |
|---|---|
IncomingBytes |
Total data written |
PutRecords.Throttle |
Number of throttled puts |
GetRecords.IteratorAge |
Age of last record (lower is better) |
ReadProvisionedThroughputExceeded |
Consumers over-read shard |
๐ธ Pricing (as of 2024)¶
| Resource | Cost |
|---|---|
| Ingest (PUT) | $0.014 per 1 million records (1 KB each) |
| Shard (hourly) | $0.015/hour/shard |
| Enhanced Fan-Out | $0.015/GB + $0.015/hour/consumer/shard |
| Extended retention | Extra cost if >24 hours (up to 7 days) |
๐ Kinesis vs Kafka vs SQS¶
| Feature | Kinesis | Kafka (MSK) | SQS |
|---|---|---|---|
| Delivery | At-least-once | Exactly-once (with effort) | At-least-once |
| Ordering | Per partition key | Per partition | No guarantee (Standard) |
| Use case | Real-time analytics | Event streaming, log replay | Task queues, buffering |
| Latency | Sub-second | Low | Low |
| Scaling | Manual shard scaling | Partition-based | Automatic |
| Serverless | โ Yes | โ (unless using Confluent) | โ Yes |
โ TL;DR Summary¶
| Feature | Kinesis Data Streams |
|---|---|
| Use case | Real-time streaming ingest & processing |
| Message size | Max 1 MB |
| Throughput | 1 MB/sec write, 2 MB/sec read per shard |
| Retention | 24 hours (up to 7 days) |
| Ordering | Per partition key |
| Integration | Lambda, KCL, Firehose, Glue, etc. |
| Security | IAM, VPC, SSE-KMS |
| Terraform Support | โ
aws_kinesis_stream |
Need more?¶
Want examples for:
-
โ Kinesis โ Lambda โ DynamoDB
-
โ Firehose delivery to S3 with schema conversion
-
โ Kafka to Kinesis migration
-
โ Monitoring Kinesis with CloudWatch alarms?
Let me know your use case โ Iโll tailor it for your stack!