Files
telemetry/docs/adr/ADR-002-three-signal-otlp-bootstrap.md
Rene Nochebuena ed4e9ef161 feat(telemetry): initial stable release v0.9.0
Single-call OTel SDK bootstrap setting all three global providers (traces → Tempo, metrics → Mimir, logs → Loki) over OTLP gRPC.

What's included:
- New(ctx, Config): bootstraps TracerProvider, MeterProvider, and LoggerProvider with OTLP gRPC exporters; sets OTel globals
- W3C TraceContext + Baggage propagation set globally
- Resource tagging: service.name, service.version, deployment.environment merged with SDK defaults
- OTLPInsecure bool for development environments without TLS
- Sequential rollback on partial initialization failure — no dangling exporters on error
- Returns shutdown func(context.Context) error; caller defers in main or wires into launcher BeforeStop
- Tier 5 module: must be imported only by application main packages; zero micro-lib dependencies

Tested-via: todo-api POC integration
Reviewed-against: docs/adr/
2026-03-18 14:13:29 -06:00

41 lines
2.8 KiB
Markdown

# ADR-002: Three-Signal OTLP gRPC Bootstrap
**Status:** Accepted
**Date:** 2026-03-18
## Context
OpenTelemetry defines three observability signals:
- **Traces** — distributed trace spans (latency, call graphs)
- **Metrics** — counters, gauges, histograms
- **Logs** — structured log records correlated with trace context
The target observability stack is the Grafana LGTM stack: **Loki** (logs), **Grafana** (dashboards), **Tempo** (traces), **Mimir** (metrics), fronted by **Grafana Alloy** as the OTLP collector/router.
The question is what to bootstrap and how to transport signals to the collector. Options include:
- Bootstrap only traces (the most common starting point), add others later.
- Bootstrap all three signals in one call, using a shared OTLP gRPC endpoint.
- Use per-signal configuration with separate endpoints.
## Decision
`telemetry.New(ctx, cfg)` bootstraps all three signals in a single call using a shared OTLP gRPC endpoint (`cfg.OTLPEndpoint`, e.g. `"alloy:4317"`):
1. **TracerProvider**`sdktrace.NewTracerProvider` with an OTLP gRPC batch exporter; W3C TraceContext + Baggage propagation set globally via `otel.SetTextMapPropagator`.
2. **MeterProvider**`sdkmetric.NewMeterProvider` with an OTLP gRPC periodic reader.
3. **LoggerProvider**`sdklog.NewLoggerProvider` with an OTLP gRPC batch processor.
All three providers share one `*resource.Resource` built from `cfg.ServiceName`, `cfg.ServiceVersion`, and `cfg.Environment` (merged with the OTel default resource which contributes `service.instance.id` and SDK metadata).
Error handling during bootstrap is sequential and rolls back already-created providers: if metric exporter creation fails, the trace provider is shut down before returning the error; if log exporter creation fails, both trace and metric providers are shut down.
The returned `shutdown` function joins the shutdown of all three providers with `errors.Join`, so a single `defer shutdown(ctx)` flushes and closes all exporters.
## Consequences
- One `Config` struct covers all three signals. Per-signal endpoint overrides are not supported in the current design. If per-signal routing is needed, Grafana Alloy handles that at the collector level.
- `OTLPInsecure: true` disables TLS on all three signal connections simultaneously. This is the expected setting for local development (Alloy runs on localhost or in the same Docker network).
- Failing to initialize any one of the three exporters aborts the entire bootstrap. A partially initialized telemetry state (e.g., traces but no metrics) is considered more dangerous than failing fast.
- The W3C TraceContext propagator is set globally. Applications that need custom propagators (e.g., B3) must call `otel.SetTextMapPropagator` after `telemetry.New` to override.
- All three providers use batch/periodic export. Synchronous export is not available through this bootstrap path.