feat(telemetry): initial stable release v0.9.0

Single-call OTel SDK bootstrap setting all three global providers (traces → Tempo, metrics → Mimir, logs → Loki) over OTLP gRPC.

What's included:
- New(ctx, Config): bootstraps TracerProvider, MeterProvider, and LoggerProvider with OTLP gRPC exporters; sets OTel globals
- W3C TraceContext + Baggage propagation set globally
- Resource tagging: service.name, service.version, deployment.environment merged with SDK defaults
- OTLPInsecure bool for development environments without TLS
- Sequential rollback on partial initialization failure — no dangling exporters on error
- Returns shutdown func(context.Context) error; caller defers in main or wires into launcher BeforeStop
- Tier 5 module: must be imported only by application main packages; zero micro-lib dependencies

Tested-via: todo-api POC integration
Reviewed-against: docs/adr/
This commit is contained in:
2026-03-18 14:13:29 -06:00
commit ed4e9ef161
14 changed files with 787 additions and 0 deletions

View File

@@ -0,0 +1,33 @@
# ADR-001: Tier 5 — Application Bootstrap Only
**Status:** Accepted
**Date:** 2026-03-18
## Context
OpenTelemetry is structured around a separation between the **API** (stable, zero-cost when no SDK is wired) and the **SDK** (the real implementation, with exporters, batch processors, and gRPC connections). Any package can import the OTel API and call `otel.Tracer(...)`, `otel.Meter(...)`, etc. at zero runtime cost — these calls are no-ops until an SDK TracerProvider is set as the global.
The question is where in the module tier hierarchy the SDK bootstrap belongs. The options are:
1. Include telemetry bootstrap in each micro-lib that produces signals (e.g., httpserver starts its own SDK).
2. Provide a standalone bootstrap module imported only by application `main` packages.
Option 1 would cause multiple SDK initializations, competing global registrations, and make it impossible for the application to control the exporter endpoint or sampling strategy. It would also force all micro-libs to carry the heavy OTel SDK as a dependency even when the application does not use telemetry.
## Decision
The `telemetry` module is **Tier 5** — the same tier as application bootstrap entry points. It must only be imported by application `main` packages (or equivalent wiring code). It must never be imported by:
- Framework libraries (Tier 03)
- Transport modules (Tier 4)
- Other Tier 5 modules that are not themselves `main`
Micro-libs use only the OTel API packages (`go.opentelemetry.io/otel`, `go.opentelemetry.io/otel/metric`, `go.opentelemetry.io/otel/log`) which default to no-op providers. When an application imports `telemetry` and calls `telemetry.New(...)`, the three global providers are replaced with real SDK providers, and all micro-libs that use the global API automatically emit signals without any change to their code.
## Consequences
- No micro-lib needs to import or configure `telemetry`. The OTel no-op default means libraries compile and run correctly in unit tests without any collector present.
- Applications that do not call `telemetry.New(...)` produce no signals. This is correct — telemetry is opt-in at the application level.
- The `telemetry` module carries heavy SDK dependencies (OTLP gRPC exporters, batch processors). These do not appear in any library's dependency graph.
- Code review must reject any PR that imports `telemetry` from a non-`main` package. This is enforced by convention, not by a build tool currently.
- There is no `launcher.Component` wrapper for telemetry. The caller is responsible for deferring the shutdown function, which flushes all exporters before process exit.

View File

@@ -0,0 +1,40 @@
# ADR-002: Three-Signal OTLP gRPC Bootstrap
**Status:** Accepted
**Date:** 2026-03-18
## Context
OpenTelemetry defines three observability signals:
- **Traces** — distributed trace spans (latency, call graphs)
- **Metrics** — counters, gauges, histograms
- **Logs** — structured log records correlated with trace context
The target observability stack is the Grafana LGTM stack: **Loki** (logs), **Grafana** (dashboards), **Tempo** (traces), **Mimir** (metrics), fronted by **Grafana Alloy** as the OTLP collector/router.
The question is what to bootstrap and how to transport signals to the collector. Options include:
- Bootstrap only traces (the most common starting point), add others later.
- Bootstrap all three signals in one call, using a shared OTLP gRPC endpoint.
- Use per-signal configuration with separate endpoints.
## Decision
`telemetry.New(ctx, cfg)` bootstraps all three signals in a single call using a shared OTLP gRPC endpoint (`cfg.OTLPEndpoint`, e.g. `"alloy:4317"`):
1. **TracerProvider**`sdktrace.NewTracerProvider` with an OTLP gRPC batch exporter; W3C TraceContext + Baggage propagation set globally via `otel.SetTextMapPropagator`.
2. **MeterProvider**`sdkmetric.NewMeterProvider` with an OTLP gRPC periodic reader.
3. **LoggerProvider**`sdklog.NewLoggerProvider` with an OTLP gRPC batch processor.
All three providers share one `*resource.Resource` built from `cfg.ServiceName`, `cfg.ServiceVersion`, and `cfg.Environment` (merged with the OTel default resource which contributes `service.instance.id` and SDK metadata).
Error handling during bootstrap is sequential and rolls back already-created providers: if metric exporter creation fails, the trace provider is shut down before returning the error; if log exporter creation fails, both trace and metric providers are shut down.
The returned `shutdown` function joins the shutdown of all three providers with `errors.Join`, so a single `defer shutdown(ctx)` flushes and closes all exporters.
## Consequences
- One `Config` struct covers all three signals. Per-signal endpoint overrides are not supported in the current design. If per-signal routing is needed, Grafana Alloy handles that at the collector level.
- `OTLPInsecure: true` disables TLS on all three signal connections simultaneously. This is the expected setting for local development (Alloy runs on localhost or in the same Docker network).
- Failing to initialize any one of the three exporters aborts the entire bootstrap. A partially initialized telemetry state (e.g., traces but no metrics) is considered more dangerous than failing fast.
- The W3C TraceContext propagator is set globally. Applications that need custom propagators (e.g., B3) must call `otel.SetTextMapPropagator` after `telemetry.New` to override.
- All three providers use batch/periodic export. Synchronous export is not available through this bootstrap path.

View File

@@ -0,0 +1,43 @@
# ADR-003: OTel API vs SDK Separation — Global Provider Strategy
**Status:** Accepted
**Date:** 2026-03-18
## Context
OpenTelemetry Go has a two-package model:
- **API packages** (`go.opentelemetry.io/otel`, `.../otel/metric`, `.../otel/log`) — stable, backward-compatible interfaces. When called with no SDK registered, all operations are no-ops with zero allocation.
- **SDK packages** (`go.opentelemetry.io/otel/sdk/...`) — concrete implementations with exporters, processors, samplers. These have real runtime cost and external dependencies.
Micro-libs (httpserver, httpmw, logz, etc.) need to emit spans, metrics, or log records. They must not carry SDK dependencies. The question is how to connect API calls in libraries to the real SDK without importing SDK packages from libraries.
The two main strategies are:
1. **Explicit injection** — each library accepts a `TracerProvider`, `MeterProvider`, or `LoggerProvider` as a constructor argument, and the application injects the real SDK provider.
2. **Global provider** — libraries call `otel.Tracer(...)` / `otel.Meter(...)` / `global.Logger(...)` which consult the process-wide global provider. The application sets that global once at startup.
## Decision
Use the **OTel global provider** strategy. Micro-libs obtain tracers, meters, and loggers from the OTel global API. `telemetry.New(...)` sets all three globals:
```go
otel.SetTracerProvider(tp) // traces
otel.SetMeterProvider(mp) // metrics
global.SetLoggerProvider(lp) // logs (go.opentelemetry.io/otel/log/global)
```
This means:
- Libraries have zero SDK dependency. They only import `go.opentelemetry.io/otel` (and sub-packages for metric/log API).
- Before `telemetry.New` is called, all OTel calls in libraries are no-ops — correct behavior in unit tests and in applications that don't use telemetry.
- After `telemetry.New` is called, all OTel calls in libraries automatically route to the real OTLP exporters with no code change required in the libraries.
Explicit injection was considered but rejected because:
- It forces every library constructor to accept provider arguments even when the application doesn't use telemetry.
- It makes the calling code more verbose (every `New(logger, cfg, tracerProvider, meterProvider, ...)`) without clear benefit in a single-process application.
- The global approach is the design intent of the OTel Go project for application-level bootstrap.
## Consequences
- The global providers are process-global mutable state. Tests that call `telemetry.New` will affect other tests running in the same process if tests run in parallel. The test suite uses a fake collector and short shutdown timeouts to mitigate this.
- If a library is used in a context where the global provider has not been set (e.g., a library test), all OTel calls are no-ops. This is correct and expected.
- Applications that use multiple `telemetry.New` calls (e.g., a misconfigured init) will overwrite the globals. Only one call to `telemetry.New` should occur per process.
- The `go.opentelemetry.io/otel/log/global` package is a separate import from `go.opentelemetry.io/otel` because the log signal API was stabilized later. Libraries using the log API must import the `log/global` sub-package for `global.SetLoggerProvider`.