Resilient HTTP client with circuit breaking, exponential-backoff retry, X-Request-ID propagation, and a generic typed JSON helper. What's included: - Client interface with Do(req) method; New(logger, cfg) and NewWithDefaults(logger) constructors - Config struct with env-tag support for timeout, dial timeout, retry, and circuit breaker parameters - Retry via avast/retry-go/v4 with BackOffDelay; triggers only on network errors and HTTP 5xx - Circuit breaker via sony/gobreaker wrapping the full retry loop; open circuit → xerrors.ErrUnavailable - X-Request-ID header propagated automatically from context via logz.GetRequestID on every attempt - DoJSON[T](ctx, client, req) generic helper for typed JSON request/response with xerrors error mapping - MapStatusToError(code, msg) exported function mapping HTTP status codes to xerrors types Tested-via: todo-api POC integration Reviewed-against: docs/adr/
60 lines
2.8 KiB
Markdown
60 lines
2.8 KiB
Markdown
# ADR-001: Circuit Breaker and Retry via gobreaker and avast/retry-go
|
|
|
|
**Status:** Accepted
|
|
**Date:** 2026-03-18
|
|
|
|
## Context
|
|
|
|
Outbound HTTP calls to external services are subject to transient failures (network blips,
|
|
brief service restarts) and sustained failures (outages, overloads). Two complementary
|
|
strategies address these cases:
|
|
|
|
- **Retry** recovers from transient failures by re-attempting the request a limited number
|
|
of times before giving up.
|
|
- **Circuit breaking** detects sustained failure patterns and stops sending requests to a
|
|
failing service, giving it time to recover and preventing the caller from accumulating
|
|
blocked goroutines.
|
|
|
|
Implementing both from scratch introduces risk of subtle bugs (backoff arithmetic, state
|
|
machine transitions). Well-tested, widely adopted libraries are preferable.
|
|
|
|
## Decision
|
|
|
|
Two external libraries are composed:
|
|
|
|
**Retry: `github.com/avast/retry-go/v4`**
|
|
- Configured via `Config.MaxRetries` and `Config.RetryDelay`.
|
|
- Uses `retry.BackOffDelay` (exponential backoff) to avoid hammering a failing service.
|
|
- `retry.LastErrorOnly(true)` ensures only the final error from the retry loop is reported.
|
|
- Only HTTP 5xx responses trigger a retry. 4xx responses are not retried (they represent
|
|
caller errors, not server instability).
|
|
|
|
**Circuit breaker: `github.com/sony/gobreaker`**
|
|
- Configured via `Config.CBThreshold` (consecutive failures to trip) and `Config.CBTimeout`
|
|
(time in open state before transitioning to half-open).
|
|
- The retry loop runs inside the circuit breaker's `Execute` call. A full retry sequence
|
|
counts as one attempt from the circuit breaker's perspective only if all retries fail.
|
|
- When the circuit opens, `Do` returns `xerrors.ErrUnavailable` immediately, without
|
|
attempting the network call.
|
|
- State changes are logged via the duck-typed `Logger` interface.
|
|
|
|
The nesting order (circuit breaker wraps retry) is intentional: the circuit breaker
|
|
accumulates failures at the level of "did the request ultimately succeed after retries",
|
|
not at the level of individual attempts.
|
|
|
|
## Consequences
|
|
|
|
**Positive:**
|
|
- Transient failures are handled transparently by the caller.
|
|
- Sustained outages are detected quickly and the circuit opens, returning fast errors.
|
|
- Configuration is explicit and environment-variable driven.
|
|
- Circuit state changes are observable via logs.
|
|
|
|
**Negative:**
|
|
- Retry with backoff increases total latency for failing requests up to
|
|
`MaxRetries * RetryDelay * (2^MaxRetries - 1)` in the worst case.
|
|
- The circuit breaker counts only consecutive failures (`ConsecutiveFailures >= CBThreshold`),
|
|
not a rolling failure rate. Interleaved successes reset the counter.
|
|
- `gobreaker.ErrOpenState` is wrapped in `xerrors.ErrUnavailable`, so callers must check for
|
|
this specific code to distinguish circuit-open from normal 503 responses.
|