-
Release v0.9.0 Stable
released this
2026-03-18 14:07:12 -06:00 | 0 commits to main since this releasev0.9.0
code.nochebuena.dev/go/healthOverview
healthprovides a singlehttp.Handlerthat interrogates any number of registered infrastructure components concurrently and returns a structured JSON response with per-component status and an overall service status. It is designed to be mounted at/healthand consumed by load balancers, container orchestrators, and uptime monitors. The two-level criticality model (LevelCritical/LevelDegraded) allows a service to report partial availability — degraded but still serving — rather than forcing a binary UP/DOWN distinction.What's Included
Leveltype —intrepresenting component criticality;LevelCritical(value0, default) andLevelDegraded(value1)Checkableinterface —HealthCheck(ctx context.Context) error,Name() string,Priority() LevelLoggerinterface — duck-typed minimal logger satisfied bylogz.Logger; defined locally sologzis not importedComponentStatusstruct — JSON-serialisable per-component result withstatus,latency, anderrorfieldsResponsestruct — JSON-serialisable overall response withstatusandcomponentsmapNewHandler(logger Logger, checks ...Checkable) http.Handler— constructs the health handler; runs all checks concurrently with a 5 s timeout derived from the request context
Installation
require code.nochebuena.dev/go/health v0.9.0Design Highlights
- All registered checks run in parallel goroutines; results are collected via a buffered channel sized to the number of checks, preventing goroutine leaks if the handler returns early (see
docs/adr/ADR-001-parallel-checks.md). - The two-level criticality model means a
LevelDegradedcomponent failure produces HTTP 200 with"status":"DEGRADED", while aLevelCriticalfailure produces HTTP 503 with"status":"DOWN", giving orchestrators and load balancers a clean binary signal while still surfacing partial degradation to monitoring (seedocs/adr/ADR-002-two-level-criticality.md). Checkableis defined in this package and implemented by infrastructure components — the dependency flows one way: infra → health (seedocs/adr/ADR-003-checkable-interface.md).- The
Loggerinterface is declared locally as a duck-typed subset oflogz.Logger, sohealthhas no micro-lib imports and remains a pure stdlib package.
Known Limitations & Edge Cases
- The 5 s check timeout is hardcoded as
context.WithTimeout(r.Context(), 5*time.Second). It is not configurable viaNewHandleroptions or per-check. A check that consistently takes close to 5 s will produce noisy latency values in the response. - Check results are never cached. Every HTTP request to the health endpoint triggers a live round of all checks. High-frequency polling (e.g. load balancer health probes every 1 s) will produce one full set of DB/network round-trips per probe.
- If the request context is cancelled before the 5 s deadline (e.g. the client disconnects), checks that have not yet completed will see a cancelled context, but goroutines already dispatched will run to their natural completion or until
ctx.Done()fires — there is no forced goroutine cancellation. NewHandlercalled with a nil logger will panic on the first request when it callslogger.WithContext. Callers must supply a non-nil logger.- The zero value of
LevelisLevelCritical. Forgetting to setPriority()on aCheckableimplementation defaults to critical — a silent misconfiguration for components intended to be degraded-only. - There is no distinction in the JSON response between a check that timed out and one that returned an explicit error — both are represented as a non-empty
errorfield inComponentStatus.
v0.9.0 → v1.0.0 Roadmap
- Make the check timeout configurable via a
NewHandleroption (e.g.Options{CheckTimeout time.Duration}), with 5 s as the default. - Add optional result caching (TTL-based) to prevent every load-balancer probe from generating live infrastructure round-trips.
- Distinguish timeout errors from explicit check errors in the
ComponentStatusJSON, so monitoring dashboards can differentiate slow checks from failing ones. - Validate that the buffered channel goroutine-leak prevention holds correctly when the check count changes dynamically (it does not currently — checks are fixed at construction time, which is correct, but should be documented explicitly).
- Achieve production validation of the parallel check path under concurrent load balancer probe traffic.
v0.9.0 rationale: The API is stable and intentional — designed through multiple architecture reviews and tested end-to-end via the todo-api POC (SQLite, RBAC, middleware stack, HTTP handlers). The module is not yet battle-tested in production for all edge cases, and the pre-1.0 designation preserves the option for minor API refinements based on real-world use.
Downloads