• v0.9.0 e1b6b7ddd7

    Rene Nochebuena released this 2026-03-18 14:07:12 -06:00 | 0 commits to main since this release

    v0.9.0

    code.nochebuena.dev/go/health

    Overview

    health provides a single http.Handler that interrogates any number of registered infrastructure components concurrently and returns a structured JSON response with per-component status and an overall service status. It is designed to be mounted at /health and consumed by load balancers, container orchestrators, and uptime monitors. The two-level criticality model (LevelCritical / LevelDegraded) allows a service to report partial availability — degraded but still serving — rather than forcing a binary UP/DOWN distinction.

    What's Included

    • Level type — int representing component criticality; LevelCritical (value 0, default) and LevelDegraded (value 1)
    • Checkable interface — HealthCheck(ctx context.Context) error, Name() string, Priority() Level
    • Logger interface — duck-typed minimal logger satisfied by logz.Logger; defined locally so logz is not imported
    • ComponentStatus struct — JSON-serialisable per-component result with status, latency, and error fields
    • Response struct — JSON-serialisable overall response with status and components map
    • NewHandler(logger Logger, checks ...Checkable) http.Handler — constructs the health handler; runs all checks concurrently with a 5 s timeout derived from the request context

    Installation

    require code.nochebuena.dev/go/health v0.9.0
    

    Design Highlights

    • All registered checks run in parallel goroutines; results are collected via a buffered channel sized to the number of checks, preventing goroutine leaks if the handler returns early (see docs/adr/ADR-001-parallel-checks.md).
    • The two-level criticality model means a LevelDegraded component failure produces HTTP 200 with "status":"DEGRADED", while a LevelCritical failure produces HTTP 503 with "status":"DOWN", giving orchestrators and load balancers a clean binary signal while still surfacing partial degradation to monitoring (see docs/adr/ADR-002-two-level-criticality.md).
    • Checkable is defined in this package and implemented by infrastructure components — the dependency flows one way: infra → health (see docs/adr/ADR-003-checkable-interface.md).
    • The Logger interface is declared locally as a duck-typed subset of logz.Logger, so health has no micro-lib imports and remains a pure stdlib package.

    Known Limitations & Edge Cases

    • The 5 s check timeout is hardcoded as context.WithTimeout(r.Context(), 5*time.Second). It is not configurable via NewHandler options or per-check. A check that consistently takes close to 5 s will produce noisy latency values in the response.
    • Check results are never cached. Every HTTP request to the health endpoint triggers a live round of all checks. High-frequency polling (e.g. load balancer health probes every 1 s) will produce one full set of DB/network round-trips per probe.
    • If the request context is cancelled before the 5 s deadline (e.g. the client disconnects), checks that have not yet completed will see a cancelled context, but goroutines already dispatched will run to their natural completion or until ctx.Done() fires — there is no forced goroutine cancellation.
    • NewHandler called with a nil logger will panic on the first request when it calls logger.WithContext. Callers must supply a non-nil logger.
    • The zero value of Level is LevelCritical. Forgetting to set Priority() on a Checkable implementation defaults to critical — a silent misconfiguration for components intended to be degraded-only.
    • There is no distinction in the JSON response between a check that timed out and one that returned an explicit error — both are represented as a non-empty error field in ComponentStatus.

    v0.9.0 → v1.0.0 Roadmap

    • Make the check timeout configurable via a NewHandler option (e.g. Options{CheckTimeout time.Duration}), with 5 s as the default.
    • Add optional result caching (TTL-based) to prevent every load-balancer probe from generating live infrastructure round-trips.
    • Distinguish timeout errors from explicit check errors in the ComponentStatus JSON, so monitoring dashboards can differentiate slow checks from failing ones.
    • Validate that the buffered channel goroutine-leak prevention holds correctly when the check count changes dynamically (it does not currently — checks are fixed at construction time, which is correct, but should be documented explicitly).
    • Achieve production validation of the parallel check path under concurrent load balancer probe traffic.

    v0.9.0 rationale: The API is stable and intentional — designed through multiple architecture reviews and tested end-to-end via the todo-api POC (SQLite, RBAC, middleware stack, HTTP handlers). The module is not yet battle-tested in production for all edge cases, and the pre-1.0 designation preserves the option for minor API refinements based on real-world use.

    Downloads