health

go/health

Fork 0

RSS Feed

v0.9.0 e1b6b7ddd7

Compare
Release v0.9.0 Stable

Rene Nochebuena released this 2026-03-18 14:07:12 -06:00 | 0 commits to main since this release
v0.9.0

code.nochebuena.dev/go/health

Overview

health provides a single http.Handler that interrogates any number of registered infrastructure components concurrently and returns a structured JSON response with per-component status and an overall service status. It is designed to be mounted at /health and consumed by load balancers, container orchestrators, and uptime monitors. The two-level criticality model (LevelCritical / LevelDegraded) allows a service to report partial availability — degraded but still serving — rather than forcing a binary UP/DOWN distinction.

What's Included
- Level type — int representing component criticality; LevelCritical (value 0, default) and LevelDegraded (value 1)
- Checkable interface — HealthCheck(ctx context.Context) error, Name() string, Priority() Level
- Logger interface — duck-typed minimal logger satisfied by logz.Logger; defined locally so logz is not imported
- ComponentStatus struct — JSON-serialisable per-component result with status, latency, and error fields
- Response struct — JSON-serialisable overall response with status and components map
- NewHandler(logger Logger, checks ...Checkable) http.Handler — constructs the health handler; runs all checks concurrently with a 5 s timeout derived from the request context
Installation
```
require code.nochebuena.dev/go/health v0.9.0
```
Design Highlights
- All registered checks run in parallel goroutines; results are collected via a buffered channel sized to the number of checks, preventing goroutine leaks if the handler returns early (see docs/adr/ADR-001-parallel-checks.md).
- The two-level criticality model means a LevelDegraded component failure produces HTTP 200 with "status":"DEGRADED", while a LevelCritical failure produces HTTP 503 with "status":"DOWN", giving orchestrators and load balancers a clean binary signal while still surfacing partial degradation to monitoring (see docs/adr/ADR-002-two-level-criticality.md).
- Checkable is defined in this package and implemented by infrastructure components — the dependency flows one way: infra → health (see docs/adr/ADR-003-checkable-interface.md).
- The Logger interface is declared locally as a duck-typed subset of logz.Logger, so health has no micro-lib imports and remains a pure stdlib package.
Known Limitations & Edge Cases
- The 5 s check timeout is hardcoded as context.WithTimeout(r.Context(), 5*time.Second). It is not configurable via NewHandler options or per-check. A check that consistently takes close to 5 s will produce noisy latency values in the response.
- Check results are never cached. Every HTTP request to the health endpoint triggers a live round of all checks. High-frequency polling (e.g. load balancer health probes every 1 s) will produce one full set of DB/network round-trips per probe.
- If the request context is cancelled before the 5 s deadline (e.g. the client disconnects), checks that have not yet completed will see a cancelled context, but goroutines already dispatched will run to their natural completion or until ctx.Done() fires — there is no forced goroutine cancellation.
- NewHandler called with a nil logger will panic on the first request when it calls logger.WithContext. Callers must supply a non-nil logger.
- The zero value of Level is LevelCritical. Forgetting to set Priority() on a Checkable implementation defaults to critical — a silent misconfiguration for components intended to be degraded-only.
- There is no distinction in the JSON response between a check that timed out and one that returned an explicit error — both are represented as a non-empty error field in ComponentStatus.
v0.9.0 → v1.0.0 Roadmap
- Make the check timeout configurable via a NewHandler option (e.g. Options{CheckTimeout time.Duration}), with 5 s as the default.
- Add optional result caching (TTL-based) to prevent every load-balancer probe from generating live infrastructure round-trips.
- Distinguish timeout errors from explicit check errors in the ComponentStatus JSON, so monitoring dashboards can differentiate slow checks from failing ones.
- Validate that the buffered channel goroutine-leak prevention holds correctly when the check count changes dynamically (it does not currently — checks are fixed at construction time, which is correct, but should be documented explicitly).
- Achieve production validation of the parallel check path under concurrent load balancer probe traffic.
v0.9.0 rationale: The API is stable and intentional — designed through multiple architecture reviews and tested end-to-end via the todo-api POC (SQLite, RBAC, middleware stack, HTTP handlers). The module is not yet battle-tested in production for all edge cases, and the pre-1.0 designation preserves the option for minor API refinements based on real-world use.
Downloads
- Source Code (ZIP)
- Source Code (TAR.GZ)

1 Release 1 Tag

Release v0.9.0 Stable

v0.9.0

Overview

What's Included

Installation

Design Highlights

Known Limitations & Edge Cases

v0.9.0 → v1.0.0 Roadmap