Skip to content

Kinesis Data Analytics

🧠 What is Kinesis Data Analytics?

Amazon Kinesis Data Analytics (KDA) is a fully managed service that allows you to analyze streaming data in real time using standard SQL, Apache Flink, or Java.

✅ It's built to ingest data from Kinesis Data Streams, Kinesis Data Firehose, or Kafka, process it in real-time, and output it to destinations like S3, Redshift, Elasticsearch, or another stream.


📦 Key Use Cases

Use Case Why KDA?
📈 Real-time metrics & dashboards Aggregate events (clicks, orders) in seconds
🛡️ Anomaly detection Spot fraud or unusual spikes with streaming pattern logic
📊 Log analytics Analyze logs from apps or IoT devices in near real time
🔁 Stream enrichment Join streams with static data (e.g., reference lookup)
🧠 ML inference Use pre-trained models in Flink for scoring events live

🧰 Types of Applications in KDA

Type Language / Framework Description
SQL SQL-92 Easy, serverless stream processing
Apache Flink Java, Scala, Python Advanced event processing, ML, stateful apps

🏗️ Architecture Overview

        +------------------+
        | Kinesis Stream   | ← Ingested data (IoT, logs, etc.)
        +--------+---------+
                 |
      +--------------------------+
      |  Kinesis Data Analytics  | ← (SQL or Flink app)
      +--------+-----------------+
     +----------------+----------------+----------------+
     |     S3         |    Redshift    |   OpenSearch   |
     +----------------+----------------+----------------+

You can optionally enrich the stream using reference data from S3 or embed ML models in Flink.


Feature KDA SQL KDA Flink
Language ANSI SQL-92 Java, Scala, Python (Flink)
Complexity Low (simple joins, aggregates) High (complex logic, ML, windows, stateful)
Latency Sub-second Sub-second
ML Support ❌ No ✅ Yes (Apache Flink APIs)
Stateful processing ❌ Limited ✅ Yes
Custom connectors ❌ No ✅ Yes

🧑‍💻 Example SQL in KDA

CREATE OR REPLACE STREAM "DEST_STREAM" (
    region VARCHAR(16),
    total_sales DOUBLE
);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DEST_STREAM"
SELECT region, SUM(sales) AS total_sales
FROM "SOURCE_SQL_STREAM_001"
GROUP BY region;

🔄 Integration Points

Source Destination
Kinesis Data Streams S3
Kinesis Data Firehose Redshift
MSK (Managed Kafka) OpenSearch (Elasticsearch)
Kafka (Self-managed on EC2) Another Kinesis stream
IoT Core Lambda, EventBridge

🧠 Notable Features

Feature Description
Streaming SQL Write SQL queries on continuous streams
Apache Flink API Custom code, windowing, joins, aggregations
Durable State Maintain state across events using Flink
Autoscaling / Fault Tolerant Resilient with checkpointing and replay
Managed Connectors Kafka, Kinesis, Flink S3 Sink

💰 Pricing Summary (2024)

Component Price
SQL apps $0.11 per GB of input
Flink apps $0.11 per KDA Processing Unit (KPU) per hour
Checkpointing storage Included up to 50 GB/month
Max parallelism Controlled by KPUs

🧠 Tip: 1 KPU = 1 vCPU + 4 GB RAM.


resource "aws_kinesisanalyticsv2_application" "flink_app" {
  name                   = "flink-example"
  runtime_environment    = "FLINK-1_15"
  service_execution_role = aws_iam_role.flink_exec.arn

  application_configuration {
    application_code_configuration {
      code_content {
        s3_content_location {
          bucket_arn = "arn:aws:s3:::my-flink-code"
          file_key   = "app.jar"
        }
      }
      code_content_type = "ZIP"
    }

    flink_application_configuration {
      parallelism_configuration {
        configuration_type = "DEFAULT"
      }
    }

    environment_properties {
      property_group {
        property_group_id = "FlinkAppProperties"
        properties = {
          input.stream = "my-kinesis-stream"
          output.bucket = "my-output-bucket"
        }
      }
    }

    application_snapshot_configuration {
      snapshots_enabled = true
    }
  }
}

✅ TL;DR Summary

Feature Amazon Kinesis Data Analytics
Serverless ✅ Yes
Languages SQL, Java, Scala (Flink)
Use Cases Real-time analytics, ETL, anomaly detection
Input Sources Kinesis, Kafka, Firehose
Destinations S3, Redshift, Elasticsearch, Kinesis
Stateful processing ✅ Flink only
Terraform support ✅ Yes (v2 apps only)