Resilient HTTP client with circuit breaking, exponential-backoff retry, X-Request-ID propagation, and a generic typed JSON helper. What's included: - Client interface with Do(req) method; New(logger, cfg) and NewWithDefaults(logger) constructors - Config struct with env-tag support for timeout, dial timeout, retry, and circuit breaker parameters - Retry via avast/retry-go/v4 with BackOffDelay; triggers only on network errors and HTTP 5xx - Circuit breaker via sony/gobreaker wrapping the full retry loop; open circuit → xerrors.ErrUnavailable - X-Request-ID header propagated automatically from context via logz.GetRequestID on every attempt - DoJSON[T](ctx, client, req) generic helper for typed JSON request/response with xerrors error mapping - MapStatusToError(code, msg) exported function mapping HTTP status codes to xerrors types Tested-via: todo-api POC integration Reviewed-against: docs/adr/
2.8 KiB
ADR-001: Circuit Breaker and Retry via gobreaker and avast/retry-go
Status: Accepted Date: 2026-03-18
Context
Outbound HTTP calls to external services are subject to transient failures (network blips, brief service restarts) and sustained failures (outages, overloads). Two complementary strategies address these cases:
- Retry recovers from transient failures by re-attempting the request a limited number of times before giving up.
- Circuit breaking detects sustained failure patterns and stops sending requests to a failing service, giving it time to recover and preventing the caller from accumulating blocked goroutines.
Implementing both from scratch introduces risk of subtle bugs (backoff arithmetic, state machine transitions). Well-tested, widely adopted libraries are preferable.
Decision
Two external libraries are composed:
Retry: github.com/avast/retry-go/v4
- Configured via
Config.MaxRetriesandConfig.RetryDelay. - Uses
retry.BackOffDelay(exponential backoff) to avoid hammering a failing service. retry.LastErrorOnly(true)ensures only the final error from the retry loop is reported.- Only HTTP 5xx responses trigger a retry. 4xx responses are not retried (they represent caller errors, not server instability).
Circuit breaker: github.com/sony/gobreaker
- Configured via
Config.CBThreshold(consecutive failures to trip) andConfig.CBTimeout(time in open state before transitioning to half-open). - The retry loop runs inside the circuit breaker's
Executecall. A full retry sequence counts as one attempt from the circuit breaker's perspective only if all retries fail. - When the circuit opens,
Doreturnsxerrors.ErrUnavailableimmediately, without attempting the network call. - State changes are logged via the duck-typed
Loggerinterface.
The nesting order (circuit breaker wraps retry) is intentional: the circuit breaker accumulates failures at the level of "did the request ultimately succeed after retries", not at the level of individual attempts.
Consequences
Positive:
- Transient failures are handled transparently by the caller.
- Sustained outages are detected quickly and the circuit opens, returning fast errors.
- Configuration is explicit and environment-variable driven.
- Circuit state changes are observable via logs.
Negative:
- Retry with backoff increases total latency for failing requests up to
MaxRetries * RetryDelay * (2^MaxRetries - 1)in the worst case. - The circuit breaker counts only consecutive failures (
ConsecutiveFailures >= CBThreshold), not a rolling failure rate. Interleaved successes reset the counter. gobreaker.ErrOpenStateis wrapped inxerrors.ErrUnavailable, so callers must check for this specific code to distinguish circuit-open from normal 503 responses.