Skip to content

Circuit Breaker Pattern

Overview

The Circuit Breaker pattern prevents an application from repeatedly trying to execute an operation that's likely to fail. It acts like an electrical circuit breaker - when failures reach a threshold, the circuit "opens" and requests fail immediately without attempting the operation, giving the failing service time to recover.


Problem Statement

Cascading Failures

Service A → Service B → Service C (FAILING)
    │           │            │
    │           │            └─ Timeout (30s)
    │           └─ Timeout (30s)
    └─ Timeout (30s)

Total Response Time: 90 seconds!
All services blocked waiting for failed service.

Issues Without Circuit Breaker

  • Resource Exhaustion - Threads blocked waiting for timeouts
  • Cascading Failures - One service failure affects entire system
  • Slow Recovery - Failed service overwhelmed with retry attempts
  • Poor User Experience - Long wait times for inevitable failures
  • Wasted Resources - CPU/memory consumed by doomed requests

When to Use Circuit Breaker

✅ Calling external services/APIs
✅ Database connections
✅ Microservices communication
✅ Network operations
✅ Any operation that can fail
✅ Need graceful degradation

When NOT to Use

❌ Internal memory operations
❌ Operations that must succeed
❌ Single-user applications
❌ Operations with no timeout


Architecture Diagram

Circuit Breaker States

                    ┌─────────────────────────────────────┐
                    │                                     │
                    │         CLOSED STATE                │
                    │    (Normal Operation)               │
                    │                                     │
                    │  • All requests pass through        │
                    │  • Counting failures                │
                    │  • Success resets counter           │
                    │                                     │
                    └──────────────┬──────────────────────┘
                                   │ Failure threshold reached
                                   │ (e.g., 5 failures in 10s)
                    ┌─────────────────────────────────────┐
                    │                                     │
                    │          OPEN STATE                 │
                    │      (Circuit Tripped)              │
                    │                                     │
                    │  • All requests fail immediately    │
                    │  • No calls to failing service      │
                    │  • Return fallback response         │
                    │  • Wait for timeout period          │
                    │                                     │
                    └──────────────┬──────────────────────┘
                                   │ Timeout expires
                                   │ (e.g., after 60s)
                    ┌─────────────────────────────────────┐
                    │                                     │
                    │        HALF-OPEN STATE              │
                    │      (Testing Recovery)             │
                    │                                     │
                    │  • Limited requests allowed         │
                    │  • Testing if service recovered     │
                    │  • Success → CLOSED                 │
                    │  • Failure → OPEN                   │
                    │                                     │
                    └──────────────┬──────────────────────┘
                    ┌──────────────┴──────────────┐
                    │                             │
                    │ Success                     │ Failure
                    ▼                             ▼
            ┌───────────────┐           ┌─────────────────┐
            │    CLOSED     │           │      OPEN       │
            │  (Recovered)  │           │  (Still Broken) │
            └───────────────┘           └─────────────────┘

System Architecture with Circuit Breaker

┌──────────────────────────────────────────────────────────────────┐
│                        Client Application                        │
└────────────────────────────┬─────────────────────────────────────┘
                             │ Request
              ┌──────────────────────────────┐
              │      Circuit Breaker         │
              │                              │
              │  State: CLOSED/OPEN/HALF     │
              │  Failure Count: 3/5          │
              │  Last Failure: 2s ago        │
              │  Timeout: 60s                │
              └──────────┬───────────────────┘
         ┌───────────────┼───────────────┐
         │               │               │
         │ CLOSED        │ OPEN          │ HALF-OPEN
         │               │               │
         ▼               ▼               ▼
┌────────────────┐  ┌────────────┐  ┌────────────┐
│  Call Service  │  │  Fail Fast │  │ Test Call  │
│                │  │  Return    │  │            │
│                │  │  Fallback  │  │            │
└───────┬────────┘  └────────────┘  └─────┬──────┘
        │                                  │
        │ Success/Failure                  │ Success/Failure
        │                                  │
        ▼                                  ▼
┌────────────────────────────────────────────────┐
│           External Service / API               │
│                                                │
│  • Payment Gateway                             │
│  • Database                                    │
│  • Third-party API                             │
│  • Microservice                                │
└────────────────────────────────────────────────┘

Workflow Explanation

Normal Operation (CLOSED State)

1. Request arrives
2. Circuit Breaker checks state → CLOSED
3. Forward request to service
4. Service responds successfully
5. Reset failure counter
6. Return response to client

Failure Detection (CLOSED → OPEN)

1. Request arrives
2. Circuit Breaker forwards to service
3. Service fails (timeout/error)
4. Increment failure counter (3/5)
5. Another request arrives
6. Service fails again (4/5)
7. Another request arrives
8. Service fails again (5/5) ← THRESHOLD REACHED
9. Circuit Breaker opens
10. Start timeout timer (60s)

Fast Fail (OPEN State)

1. Request arrives
2. Circuit Breaker checks state → OPEN
3. Check if timeout expired → NO
4. Fail immediately (no service call)
5. Return fallback response
6. Total time: < 1ms (vs 30s timeout)

Recovery Testing (HALF-OPEN State)

1. Timeout expires (60s passed)
2. Circuit Breaker → HALF-OPEN
3. Next request arrives
4. Allow ONE test request through
5. Forward to service
6. Service responds successfully
7. Circuit Breaker → CLOSED
8. Resume normal operation

Implementation Examples

Node.js with opossum

const CircuitBreaker = require('opossum');
const axios = require('axios');

// Function to protect
async function callExternalAPI(userId) {
  const response = await axios.get(`https://api.example.com/users/${userId}`);
  return response.data;
}

// Circuit breaker options
const options = {
  timeout: 3000,              // 3s timeout
  errorThresholdPercentage: 50, // Open after 50% failures
  resetTimeout: 30000,        // Try again after 30s
  rollingCountTimeout: 10000, // 10s window for counting
  rollingCountBuckets: 10,    // Number of buckets
  name: 'externalAPI',
  fallback: (userId) => {
    // Fallback response when circuit is open
    return {
      id: userId,
      name: 'Unknown',
      cached: true,
      message: 'Service temporarily unavailable'
    };
  }
};

// Create circuit breaker
const breaker = new CircuitBreaker(callExternalAPI, options);

// Event listeners
breaker.on('open', () => {
  console.log('Circuit breaker opened - service is failing');
});

breaker.on('halfOpen', () => {
  console.log('Circuit breaker half-open - testing service');
});

breaker.on('close', () => {
  console.log('Circuit breaker closed - service recovered');
});

breaker.on('fallback', (result) => {
  console.log('Fallback executed:', result);
});

// Usage
async function getUser(userId) {
  try {
    const user = await breaker.fire(userId);
    return user;
  } catch (error) {
    console.error('Request failed:', error.message);
    throw error;
  }
}

// Express endpoint
app.get('/users/:id', async (req, res) => {
  try {
    const user = await getUser(req.params.id);
    res.json(user);
  } catch (error) {
    res.status(503).json({
      error: 'Service unavailable',
      message: error.message
    });
  }
});

// Health check endpoint
app.get('/health/circuit-breaker', (req, res) => {
  res.json({
    name: breaker.name,
    state: breaker.opened ? 'OPEN' : breaker.halfOpen ? 'HALF_OPEN' : 'CLOSED',
    stats: breaker.stats
  });
});

Go with gobreaker

package main

import (
    "errors"
    "fmt"
    "net/http"
    "time"

    "github.com/sony/gobreaker"
)

// Create circuit breaker
var cb *gobreaker.CircuitBreaker

func init() {
    settings := gobreaker.Settings{
        Name:        "ExternalAPI",
        MaxRequests: 3,                    // Max requests in half-open
        Interval:    time.Second * 10,     // Rolling window
        Timeout:     time.Second * 60,     // Time before half-open
        ReadyToTrip: func(counts gobreaker.Counts) bool {
            failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
            return counts.Requests >= 3 && failureRatio >= 0.6
        },
        OnStateChange: func(name string, from gobreaker.State, to gobreaker.State) {
            fmt.Printf("Circuit Breaker '%s': %s -> %s\n", name, from, to)
        },
    }

    cb = gobreaker.NewCircuitBreaker(settings)
}

// Protected function
func callExternalAPI(userID string) (User, error) {
    result, err := cb.Execute(func() (interface{}, error) {
        // Make HTTP request
        resp, err := http.Get(fmt.Sprintf("https://api.example.com/users/%s", userID))
        if err != nil {
            return nil, err
        }
        defer resp.Body.Close()

        if resp.StatusCode != 200 {
            return nil, fmt.Errorf("API returned status %d", resp.StatusCode)
        }

        var user User
        if err := json.NewDecoder(resp.Body).Decode(&user); err != nil {
            return nil, err
        }

        return user, nil
    })

    if err != nil {
        // Return fallback
        return User{
            ID:      userID,
            Name:    "Unknown",
            Cached:  true,
            Message: "Service temporarily unavailable",
        }, err
    }

    return result.(User), nil
}

// HTTP handler
func getUserHandler(w http.ResponseWriter, r *http.Request) {
    userID := r.URL.Query().Get("id")

    user, err := callExternalAPI(userID)
    if err != nil {
        if errors.Is(err, gobreaker.ErrOpenState) {
            w.WriteHeader(http.StatusServiceUnavailable)
            json.NewEncoder(w).Encode(map[string]string{
                "error": "Circuit breaker is open",
                "message": "Service temporarily unavailable",
            })
            return
        }
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(user)
}

// Health check
func healthHandler(w http.ResponseWriter, r *http.Request) {
    state := cb.State()

    status := map[string]interface{}{
        "name":  cb.Name,
        "state": state.String(),
        "counts": cb.Counts(),
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(status)
}

Java with Resilience4j

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.circuitbreaker.CircuitBreakerRegistry;
import java.time.Duration;

public class CircuitBreakerExample {

    private final CircuitBreaker circuitBreaker;
    private final ExternalAPIClient apiClient;

    public CircuitBreakerExample() {
        // Configure circuit breaker
        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
            .failureRateThreshold(50)                    // 50% failure rate
            .waitDurationInOpenState(Duration.ofSeconds(60))  // Wait 60s
            .slidingWindowSize(10)                       // Last 10 calls
            .minimumNumberOfCalls(5)                     // Min 5 calls
            .permittedNumberOfCallsInHalfOpenState(3)    // 3 test calls
            .automaticTransitionFromOpenToHalfOpenEnabled(true)
            .build();

        CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
        this.circuitBreaker = registry.circuitBreaker("externalAPI");

        // Event listeners
        circuitBreaker.getEventPublisher()
            .onStateTransition(event -> 
                System.out.println("Circuit Breaker: " + event.getStateTransition())
            )
            .onError(event -> 
                System.out.println("Error: " + event.getThrowable().getMessage())
            );

        this.apiClient = new ExternalAPIClient();
    }

    public User getUser(String userId) {
        return circuitBreaker.executeSupplier(() -> {
            try {
                return apiClient.fetchUser(userId);
            } catch (Exception e) {
                throw new RuntimeException("API call failed", e);
            }
        });
    }

    public User getUserWithFallback(String userId) {
        return circuitBreaker.executeSupplier(
            () -> apiClient.fetchUser(userId),
            throwable -> {
                // Fallback response
                return new User(userId, "Unknown", true, 
                    "Service temporarily unavailable");
            }
        );
    }
}

Configuration Parameters

Key Settings

Parameter Description Typical Value
Failure Threshold Number/percentage of failures to open 50% or 5 failures
Timeout Max time to wait for response 3-10 seconds
Reset Timeout Time before trying half-open 30-60 seconds
Success Threshold Successes needed to close 2-3 requests
Rolling Window Time window for counting failures 10-60 seconds
Half-Open Requests Test requests in half-open state 1-3 requests

Example Configuration

circuit-breaker:
  external-api:
    failure-rate-threshold: 50        # 50% failures
    slow-call-rate-threshold: 50      # 50% slow calls
    slow-call-duration-threshold: 3s  # > 3s is slow
    wait-duration-in-open-state: 60s  # Wait 60s
    sliding-window-type: COUNT_BASED  # or TIME_BASED
    sliding-window-size: 10           # Last 10 calls
    minimum-number-of-calls: 5        # Min 5 calls
    permitted-calls-in-half-open: 3   # 3 test calls

Fallback Strategies

1. Cached Response

const cache = new Map();

const fallback = (userId) => {
  // Return cached data if available
  if (cache.has(userId)) {
    return {
      ...cache.get(userId),
      cached: true,
      timestamp: new Date()
    };
  }

  // Return default response
  return {
    id: userId,
    name: 'Unknown',
    message: 'Service unavailable'
  };
};

2. Default Response

const fallback = () => ({
  status: 'unavailable',
  message: 'Service temporarily unavailable. Please try again later.',
  retryAfter: 60
});

3. Alternative Service

const fallback = async (userId) => {
  // Try backup service
  try {
    return await backupService.getUser(userId);
  } catch (error) {
    return defaultResponse(userId);
  }
};

4. Degraded Functionality

const fallback = (userId) => ({
  id: userId,
  name: 'User',
  features: {
    basicProfile: true,
    advancedFeatures: false  // Disable advanced features
  },
  message: 'Running in degraded mode'
});

Monitoring & Metrics

Key Metrics to Track

┌─────────────────────────────────────────┐
│      Circuit Breaker Metrics            │
├─────────────────────────────────────────┤
│ • Current State (CLOSED/OPEN/HALF)      │
│ • Failure Rate (%)                      │
│ • Success Rate (%)                      │
│ • Total Requests                        │
│ • Failed Requests                       │
│ • Successful Requests                   │
│ • Rejected Requests (circuit open)      │
│ • Response Time (p50, p95, p99)         │
│ • Time in Each State                    │
│ • State Transitions Count               │
└─────────────────────────────────────────┘

Prometheus Metrics

const { Gauge, Counter, Histogram } = require('prom-client');

// Circuit breaker state
const circuitState = new Gauge({
  name: 'circuit_breaker_state',
  help: 'Circuit breaker state (0=closed, 1=open, 2=half-open)',
  labelNames: ['name']
});

// Request counter
const requests = new Counter({
  name: 'circuit_breaker_requests_total',
  help: 'Total requests through circuit breaker',
  labelNames: ['name', 'result']  // result: success, failure, rejected
});

// Response time
const responseTime = new Histogram({
  name: 'circuit_breaker_response_time_seconds',
  help: 'Response time through circuit breaker',
  labelNames: ['name']
});

// Update metrics
breaker.on('success', () => {
  requests.inc({ name: breaker.name, result: 'success' });
});

breaker.on('failure', () => {
  requests.inc({ name: breaker.name, result: 'failure' });
});

breaker.on('reject', () => {
  requests.inc({ name: breaker.name, result: 'rejected' });
});

breaker.on('open', () => {
  circuitState.set({ name: breaker.name }, 1);
});

breaker.on('close', () => {
  circuitState.set({ name: breaker.name }, 0);
});

Grafana Dashboard

┌─────────────────────────────────────────────────────────┐
│  Circuit Breaker: External API                          │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  State: CLOSED          Uptime: 99.5%                   │
│                                                          │
│  ┌──────────────────────────────────────────────────┐  │
│  │  Request Rate                                     │  │
│  │  ▁▂▃▄▅▆▇█▇▆▅▄▃▂▁                                 │  │
│  │  1000 req/s                                       │  │
│  └──────────────────────────────────────────────────┘  │
│                                                          │
│  ┌──────────────────────────────────────────────────┐  │
│  │  Success Rate                                     │  │
│  │  ████████████████████████████████████░░  95%     │  │
│  └──────────────────────────────────────────────────┘  │
│                                                          │
│  ┌──────────────────────────────────────────────────┐  │
│  │  Response Time (p95)                              │  │
│  │  ▁▁▂▂▃▃▄▄▅▅▆▆▇▇██                                │  │
│  │  250ms                                            │  │
│  └──────────────────────────────────────────────────┘  │
│                                                          │
│  State Transitions (Last 24h): 2                        │
│  Last Open: 2 hours ago (Duration: 60s)                 │
└─────────────────────────────────────────────────────────┘

Best Practices

Configuration

Set Appropriate Thresholds - Not too sensitive, not too lenient
Consider Service SLA - Align timeouts with SLA
Test Failure Scenarios - Verify circuit breaker works
Monitor Metrics - Track state transitions
Document Behavior - Team understands fallbacks

Implementation

Implement Fallbacks - Always provide fallback response
Use Caching - Cache responses for fallback
Log State Changes - Track when circuit opens/closes
Gradual Recovery - Use half-open state properly
Per-Service Breakers - Separate breaker per dependency

Operations

Alert on Open State - Know when services fail
Dashboard Visibility - Monitor circuit breaker health
Regular Testing - Chaos engineering
Review Thresholds - Adjust based on metrics
Document Incidents - Learn from failures


Pros & Cons

Advantages

Prevents Cascading Failures - Isolates failures
Fast Failure - No waiting for timeouts
Resource Protection - Prevents resource exhaustion
Automatic Recovery - Self-healing capability
Better UX - Faster error responses
Graceful Degradation - Fallback responses

Disadvantages

Added Complexity - More code to maintain
Configuration Challenge - Hard to tune correctly
False Positives - May open unnecessarily
Monitoring Overhead - Need good observability
Testing Complexity - Hard to test all scenarios


Real-World Examples

Netflix Hystrix

  • Protects 10+ billion requests/day
  • Prevents cascading failures
  • Provides fallback responses
  • Real-time monitoring dashboard

AWS API Gateway

  • Built-in circuit breaker
  • Automatic throttling
  • Protects backend services

Kubernetes

  • Readiness/liveness probes
  • Service mesh circuit breakers (Istio)
  • Automatic pod restarts


Tools & Libraries

Node.js

  • opossum - Full-featured circuit breaker
  • cockatiel - Resilience patterns library
  • brakes - Hystrix-inspired breaker

Go

  • gobreaker - Simple circuit breaker
  • hystrix-go - Netflix Hystrix port

Java

  • Resilience4j - Modern resilience library
  • Hystrix - Netflix (maintenance mode)
  • Failsafe - Lightweight library

Python

  • pybreaker - Circuit breaker implementation
  • circuitbreaker - Simple decorator

Last Updated: January 5, 2026
Pattern Complexity: Medium
Recommended For: All distributed systems with external dependencies