Redis High Availability
5️⃣ How to make Redis Highly Available (HA)¶
High availability = Redis should keep working even if one node dies.
There are two classic designs:
🧱 A. Redis Master–Replica + Sentinel¶
Components:
-
1 Primary (accepts writes)
-
1+ Replicas (copy data from primary)
-
Sentinel processes watch Redis nodes and do failover.
How it works:
-
All apps write/read to the primary (via a VIP / DNS name).
-
Replicas constantly sync from primary.
-
If primary dies:
-
Sentinel promotes a replica to be new primary
-
Apps reconnect (library usually handles this with retry + new address).
-
Pros:
-
Simple concept
-
Good for many workloads
Cons:
-
No automatic sharding (one primary is still a limit)
-
Need more effort to set up & monitor Sentinel
🧩 B. Redis Cluster (sharding + HA)¶
-
Data is sharded across multiple nodes
-
Each shard has:
-
1 primary
-
1+ replicas
-
-
Automatic failover and rebalancing.
Pros:
-
Horizontal scale (more data, more throughput)
-
Built-in HA
Cons:
-
Client library must be Cluster-aware
-
A bit more complex to operate
🛡️ Extra HA & Reliability Practices¶
No matter which design:
-
Enable persistence
-
RDB snapshots (periodic)
-
AOF (append-only file) for durability
-
-
Backups
- Copy RDB/AOF files to S3/NFS/backup server
-
Multi-AZ deployment
- Run nodes in different availability zones / racks
-
Connection retries
- In Node.js/Go, configure Redis client with reconnect, timeouts
-
Monitoring
-
Use
redis_exporter+ Prometheus + Grafana -
Track: memory, CPU, hit-rate, latency, evictions
-
🧠 Quick Cheat Sheet¶
-
Small dev / single server → Single Redis on VM or Docker
-
Small production → Primary + replica with Sentinel
-
Heavy traffic / large data → Redis Cluster on VMs or K8s
-
You don’t want ops overhead → (If allowed) use managed Redis like AWS ElastiCache / Azure Cache / GCP Memorystore