Skip to content

Redshift

๐Ÿง  What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale data warehouse service designed for OLAP (analytical) workloads. It enables you to run complex queries across large datasets stored in Redshift, S3 (via Redshift Spectrum), and other sources using standard SQL.

โœ… It's optimized for high-performance querying, data warehousing, and business intelligence (BI) reporting.


๐Ÿงฐ Common Use Cases

Use Case Why Redshift?
๐Ÿ“Š Business Intelligence Fast queries on structured datasets
๐Ÿ›ข๏ธ Centralized Data Lake Combine S3 + Redshift via Redshift Spectrum
๐Ÿ” ETL Processing Load/transform data from operational DBs
๐Ÿงช Analytics over semi-structured data Supports JSON, Parquet, Avro, etc.
๐Ÿ”— Integration with BI tools QuickSight, Tableau, Power BI

๐Ÿงฑ Core Components

Component Description
Cluster A collection of nodes managed as a single data warehouse
Leader Node Receives queries, plans execution, and aggregates results
Compute Nodes Store data and perform query processing
Node Types Dense Storage (DS2) and Dense Compute (DC2), RA3 (scalable managed storage)
Data Distribution EVEN, KEY, or ALL to manage data placement

๐Ÿ—๏ธ Architecture

           +--------------------------+
           |     BI Tool / SQL       |
           +------------+------------+
                        โ†“
                 +------+------+
                 | Leader Node |
                 +------+------+
                        โ†“
           +--------------------------+
           |    Compute Nodes (RA3)   |
           +--------------------------+
                     โ†“   โ†“   โ†“
               Data Blocks in Managed Storage
  • Leader Node handles SQL parsing and query planning.

  • Compute Nodes execute the query.

  • RA3 nodes separate storage and compute, allowing you to scale independently.


๐Ÿ“ฆ Redshift Spectrum

Redshift Spectrum lets you query data in Amazon S3 without loading it into Redshift.

๐Ÿ”ฅ Great for querying raw log files or cold storage without duplicating data.

  • Supports Parquet, ORC, CSV, JSON

  • Uses Glue Data Catalog for schema

  • Ideal for data lake + warehouse hybrid


๐Ÿ” Security

Feature Description
IAM Policies Control Redshift cluster access
VPC Security Groups Define inbound/outbound traffic
KMS Encryption Encrypt data at rest (built-in or custom key)
SSL in Transit Protect client connections to Redshift
Audit Logging Enable logging to S3 for access/query events
Row-level Security (RLS) Fine-grained access control for rows (preview)

๐Ÿงช Performance Features

Feature Description
Columnar Storage Fast scanning, great compression
Data Compression Built-in encoding for performance & cost savings
Query Optimization Advanced planner, result caching, materialized views
Concurrency Scaling Auto-spawns temporary clusters for burst workloads
Workload Management (WLM) Assign queue priorities to workloads (ETL vs dashboard queries)
Sort Keys + Distribution Improves join/filter performance

๐Ÿ› ๏ธ Terraform Example โ€“ Redshift Cluster (RA3)

resource "aws_redshift_subnet_group" "example" {
  name       = "example-subnet-group"
  subnet_ids = ["subnet-12345", "subnet-67890"]
}

resource "aws_redshift_cluster" "example" {
  cluster_identifier = "my-redshift-cluster"
  node_type          = "ra3.4xlarge"
  number_of_nodes    = 2

  database_name      = "analytics"
  master_username    = "admin"
  master_password    = "StrongPassword123"

  cluster_subnet_group_name = aws_redshift_subnet_group.example.name
  publicly_accessible       = false
  encrypted                 = true

  tags = {
    Environment = "prod"
  }
}

๐Ÿ’ฐ Pricing (2024 Overview)

Type Cost (approx)
RA3.4xlarge (on-demand) ~$3.26/hour/node
Managed Storage (RA3) ~$0.024/GB-month (billed separately)
Concurrency Scaling Free for 1 hour/day (then billed)
Spectrum (S3 query) $5 per TB scanned
Reserved Instances Save up to 75% on 1โ€“3 year term

๐Ÿง  Tip: Use compression + Spectrum to keep storage low and only query active data.


๐ŸŽฏ Comparison with Other Services

Use Case Use This
Petabyte-scale analytics โœ… Amazon Redshift
Real-time streaming analytics Amazon Kinesis + Redshift
Operational DB (OLTP) Amazon RDS / Aurora
S3 SQL-like ad hoc queries Amazon Athena
Massive joins with data lake Redshift Spectrum

โœ… TL;DR Summary

Feature Amazon Redshift
Query Engine PostgreSQL-compatible SQL
Storage Format Columnar, compressed
Serverless Mode โœ… Available (since 2022)
Best For Large-scale analytics, BI workloads
Integrations QuickSight, Glue, S3, Kinesis, Lambda
Terraform Support โœ… Fully supported

๐Ÿ”„ Optional: Redshift Serverless (2024)

Benefit Description
Serverless No cluster to manage
Pay-per-query You pay for Redshift Processing Units (RPUs)
Easy to scale Great for unpredictable workloads
Terraform Support โœ… Supported via aws_redshiftserverless_* resources