Skip to content

Kinesis Data Streams

๐Ÿ”„ What is Amazon Kinesis Data Streams?

Amazon Kinesis Data Streams (KDS) is a real-time, serverless streaming service that enables you to ingest, buffer, and process high-throughput data streams in near real-time.

โœ… Best suited for real-time analytics, log ingestion, telemetry, clickstreams, and data pipeline ingestion.


๐Ÿง  Key Concepts

Concept Description
Stream A logical entity that captures ordered data records
Shard A unit of parallelism (1MB/s input, 2MB/s output, up to 1000 records/sec)
Record Data blob (max 1 MB) + optional key + sequence number
Partition Key Used to group and distribute records into shards
Consumer Processes data from shards (e.g., Lambda, KCL app, Firehose)

๐ŸŽฏ Use Cases

Use Case Description
๐Ÿ” Real-time log analytics Ingest logs for real-time dashboards (CloudWatch, ELK, etc.)
๐Ÿ“ˆ Metrics pipeline Collect app metrics and visualize instantly
๐Ÿ›’ Clickstream analysis Track user activity on websites or apps
๐Ÿงช Fraud detection Analyze payment streams in real-time
โš™๏ธ ETL staging Buffer raw events before storing in S3 or Redshift

๐Ÿ”ง How It Works

[Producer] โ†’ [Kinesis Stream (shards)] โ†’ [Consumer: Lambda, KCL, Firehose] โ†’ [DB/S3/etc.]

๐Ÿ“ฆ Record Structure

{
  "Data": "base64-encoded blob",
  "PartitionKey": "user123"
}
  • Max record size: 1 MB

  • Max retention: 7 days (default 24 hours)

  • Guaranteed ordering within partition key


โš™๏ธ Consumer Options

Consumer Type Description Throughput
Lambda Event-based, simple integration ~1000/sec/shard
KCL (Java, Python) Client Library for heavy stream processing Full shard capacity
Enhanced Fan-Out (EFO) Dedicated 2 MB/sec pipe per consumer High concurrency
Firehose For automatic delivery to S3, Redshift, etc. Managed

๐Ÿ” Security

Layer Options
IAM Policies Fine-grained control over Kinesis
KMS Encrypt streams at rest (SSE-KMS)
VPC Endpoints Private connectivity
Access Control Via policies or signed requests

๐Ÿงฐ Terraform Example โ€” Kinesis + Lambda

1. Create Kinesis Stream

resource "aws_kinesis_stream" "demo" {
  name             = "demo-stream"
  shard_count      = 2
  retention_period = 48
  encryption_type  = "KMS"
}

2. Attach Lambda as Consumer

resource "aws_lambda_event_source_mapping" "kinesis_lambda" {
  event_source_arn  = aws_kinesis_stream.demo.arn
  function_name     = aws_lambda_function.processor.arn
  starting_position = "LATEST"
  batch_size        = 100
  enabled           = true
}

Note: Your Lambda must have kinesis:DescribeStream, kinesis:GetRecords, kinesis:GetShardIterator, etc.


๐Ÿงช Enhanced Fan-Out Example (Optional)

resource "aws_kinesis_stream_consumer" "consumer" {
  name             = "efo-consumer"
  stream_arn       = aws_kinesis_stream.demo.arn
}

๐Ÿงฎ Performance Guidelines

Best Practice Why?
Use partition keys To balance load across shards
Use Enhanced Fan-Out for high concurrency Avoids throttling shared by consumers
Tune shard count Based on incoming rate (1MB/sec/shard)
Use batching with Lambda Reduces cost and improves throughput

๐Ÿ“Š Monitoring with CloudWatch

Metric Description
IncomingBytes Total data written
PutRecords.Throttle Number of throttled puts
GetRecords.IteratorAge Age of last record (lower is better)
ReadProvisionedThroughputExceeded Consumers over-read shard

๐Ÿ’ธ Pricing (as of 2024)

Resource Cost
Ingest (PUT) $0.014 per 1 million records (1 KB each)
Shard (hourly) $0.015/hour/shard
Enhanced Fan-Out $0.015/GB + $0.015/hour/consumer/shard
Extended retention Extra cost if >24 hours (up to 7 days)

๐Ÿ†š Kinesis vs Kafka vs SQS

Feature Kinesis Kafka (MSK) SQS
Delivery At-least-once Exactly-once (with effort) At-least-once
Ordering Per partition key Per partition No guarantee (Standard)
Use case Real-time analytics Event streaming, log replay Task queues, buffering
Latency Sub-second Low Low
Scaling Manual shard scaling Partition-based Automatic
Serverless โœ… Yes โŒ (unless using Confluent) โœ… Yes

โœ… TL;DR Summary

Feature Kinesis Data Streams
Use case Real-time streaming ingest & processing
Message size Max 1 MB
Throughput 1 MB/sec write, 2 MB/sec read per shard
Retention 24 hours (up to 7 days)
Ordering Per partition key
Integration Lambda, KCL, Firehose, Glue, etc.
Security IAM, VPC, SSE-KMS
Terraform Support โœ… aws_kinesis_stream

Need more?

Want examples for:

  • โœ… Kinesis โž Lambda โž DynamoDB

  • โœ… Firehose delivery to S3 with schema conversion

  • โœ… Kafka to Kinesis migration

  • โœ… Monitoring Kinesis with CloudWatch alarms?

Let me know your use case โ€” Iโ€™ll tailor it for your stack!