Redshift
๐ง What is Amazon Redshift?¶
Amazon Redshift is a fully managed, petabyte-scale data warehouse service designed for OLAP (analytical) workloads. It enables you to run complex queries across large datasets stored in Redshift, S3 (via Redshift Spectrum), and other sources using standard SQL.
โ It's optimized for high-performance querying, data warehousing, and business intelligence (BI) reporting.
๐งฐ Common Use Cases¶
| Use Case | Why Redshift? |
|---|---|
| ๐ Business Intelligence | Fast queries on structured datasets |
| ๐ข๏ธ Centralized Data Lake | Combine S3 + Redshift via Redshift Spectrum |
| ๐ ETL Processing | Load/transform data from operational DBs |
| ๐งช Analytics over semi-structured data | Supports JSON, Parquet, Avro, etc. |
| ๐ Integration with BI tools | QuickSight, Tableau, Power BI |
๐งฑ Core Components¶
| Component | Description |
|---|---|
| Cluster | A collection of nodes managed as a single data warehouse |
| Leader Node | Receives queries, plans execution, and aggregates results |
| Compute Nodes | Store data and perform query processing |
| Node Types | Dense Storage (DS2) and Dense Compute (DC2), RA3 (scalable managed storage) |
| Data Distribution | EVEN, KEY, or ALL to manage data placement |
๐๏ธ Architecture¶
+--------------------------+
| BI Tool / SQL |
+------------+------------+
โ
+------+------+
| Leader Node |
+------+------+
โ
+--------------------------+
| Compute Nodes (RA3) |
+--------------------------+
โ โ โ
Data Blocks in Managed Storage
-
Leader Node handles SQL parsing and query planning.
-
Compute Nodes execute the query.
-
RA3 nodes separate storage and compute, allowing you to scale independently.
๐ฆ Redshift Spectrum¶
Redshift Spectrum lets you query data in Amazon S3 without loading it into Redshift.
๐ฅ Great for querying raw log files or cold storage without duplicating data.
-
Supports Parquet, ORC, CSV, JSON
-
Uses Glue Data Catalog for schema
-
Ideal for data lake + warehouse hybrid
๐ Security¶
| Feature | Description |
|---|---|
| IAM Policies | Control Redshift cluster access |
| VPC Security Groups | Define inbound/outbound traffic |
| KMS Encryption | Encrypt data at rest (built-in or custom key) |
| SSL in Transit | Protect client connections to Redshift |
| Audit Logging | Enable logging to S3 for access/query events |
| Row-level Security (RLS) | Fine-grained access control for rows (preview) |
๐งช Performance Features¶
| Feature | Description |
|---|---|
| Columnar Storage | Fast scanning, great compression |
| Data Compression | Built-in encoding for performance & cost savings |
| Query Optimization | Advanced planner, result caching, materialized views |
| Concurrency Scaling | Auto-spawns temporary clusters for burst workloads |
| Workload Management (WLM) | Assign queue priorities to workloads (ETL vs dashboard queries) |
| Sort Keys + Distribution | Improves join/filter performance |
๐ ๏ธ Terraform Example โ Redshift Cluster (RA3)¶
resource "aws_redshift_subnet_group" "example" {
name = "example-subnet-group"
subnet_ids = ["subnet-12345", "subnet-67890"]
}
resource "aws_redshift_cluster" "example" {
cluster_identifier = "my-redshift-cluster"
node_type = "ra3.4xlarge"
number_of_nodes = 2
database_name = "analytics"
master_username = "admin"
master_password = "StrongPassword123"
cluster_subnet_group_name = aws_redshift_subnet_group.example.name
publicly_accessible = false
encrypted = true
tags = {
Environment = "prod"
}
}
๐ฐ Pricing (2024 Overview)¶
| Type | Cost (approx) |
|---|---|
| RA3.4xlarge (on-demand) | ~$3.26/hour/node |
| Managed Storage (RA3) | ~$0.024/GB-month (billed separately) |
| Concurrency Scaling | Free for 1 hour/day (then billed) |
| Spectrum (S3 query) | $5 per TB scanned |
| Reserved Instances | Save up to 75% on 1โ3 year term |
๐ง Tip: Use compression + Spectrum to keep storage low and only query active data.
๐ฏ Comparison with Other Services¶
| Use Case | Use This |
|---|---|
| Petabyte-scale analytics | โ Amazon Redshift |
| Real-time streaming analytics | Amazon Kinesis + Redshift |
| Operational DB (OLTP) | Amazon RDS / Aurora |
| S3 SQL-like ad hoc queries | Amazon Athena |
| Massive joins with data lake | Redshift Spectrum |
โ TL;DR Summary¶
| Feature | Amazon Redshift |
|---|---|
| Query Engine | PostgreSQL-compatible SQL |
| Storage Format | Columnar, compressed |
| Serverless Mode | โ Available (since 2022) |
| Best For | Large-scale analytics, BI workloads |
| Integrations | QuickSight, Glue, S3, Kinesis, Lambda |
| Terraform Support | โ Fully supported |
๐ Optional: Redshift Serverless (2024)¶
| Benefit | Description |
|---|---|
| Serverless | No cluster to manage |
| Pay-per-query | You pay for Redshift Processing Units (RPUs) |
| Easy to scale | Great for unpredictable workloads |
| Terraform Support | โ
Supported via aws_redshiftserverless_* resources |