# ADR-001: Circuit Breaker and Retry via gobreaker and avast/retry-go **Status:** Accepted **Date:** 2026-03-18 ## Context Outbound HTTP calls to external services are subject to transient failures (network blips, brief service restarts) and sustained failures (outages, overloads). Two complementary strategies address these cases: - **Retry** recovers from transient failures by re-attempting the request a limited number of times before giving up. - **Circuit breaking** detects sustained failure patterns and stops sending requests to a failing service, giving it time to recover and preventing the caller from accumulating blocked goroutines. Implementing both from scratch introduces risk of subtle bugs (backoff arithmetic, state machine transitions). Well-tested, widely adopted libraries are preferable. ## Decision Two external libraries are composed: **Retry: `github.com/avast/retry-go/v4`** - Configured via `Config.MaxRetries` and `Config.RetryDelay`. - Uses `retry.BackOffDelay` (exponential backoff) to avoid hammering a failing service. - `retry.LastErrorOnly(true)` ensures only the final error from the retry loop is reported. - Only HTTP 5xx responses trigger a retry. 4xx responses are not retried (they represent caller errors, not server instability). **Circuit breaker: `github.com/sony/gobreaker`** - Configured via `Config.CBThreshold` (consecutive failures to trip) and `Config.CBTimeout` (time in open state before transitioning to half-open). - The retry loop runs inside the circuit breaker's `Execute` call. A full retry sequence counts as one attempt from the circuit breaker's perspective only if all retries fail. - When the circuit opens, `Do` returns `xerrors.ErrUnavailable` immediately, without attempting the network call. - State changes are logged via the duck-typed `Logger` interface. The nesting order (circuit breaker wraps retry) is intentional: the circuit breaker accumulates failures at the level of "did the request ultimately succeed after retries", not at the level of individual attempts. ## Consequences **Positive:** - Transient failures are handled transparently by the caller. - Sustained outages are detected quickly and the circuit opens, returning fast errors. - Configuration is explicit and environment-variable driven. - Circuit state changes are observable via logs. **Negative:** - Retry with backoff increases total latency for failing requests up to `MaxRetries * RetryDelay * (2^MaxRetries - 1)` in the worst case. - The circuit breaker counts only consecutive failures (`ConsecutiveFailures >= CBThreshold`), not a rolling failure rate. Interleaved successes reset the counter. - `gobreaker.ErrOpenState` is wrapped in `xerrors.ErrUnavailable`, so callers must check for this specific code to distinguish circuit-open from normal 503 responses.