Files
telemetry/CLAUDE.md
Rene Nochebuena ed4e9ef161 feat(telemetry): initial stable release v0.9.0
Single-call OTel SDK bootstrap setting all three global providers (traces → Tempo, metrics → Mimir, logs → Loki) over OTLP gRPC.

What's included:
- New(ctx, Config): bootstraps TracerProvider, MeterProvider, and LoggerProvider with OTLP gRPC exporters; sets OTel globals
- W3C TraceContext + Baggage propagation set globally
- Resource tagging: service.name, service.version, deployment.environment merged with SDK defaults
- OTLPInsecure bool for development environments without TLS
- Sequential rollback on partial initialization failure — no dangling exporters on error
- Returns shutdown func(context.Context) error; caller defers in main or wires into launcher BeforeStop
- Tier 5 module: must be imported only by application main packages; zero micro-lib dependencies

Tested-via: todo-api POC integration
Reviewed-against: docs/adr/
2026-03-18 14:13:29 -06:00

87 lines
4.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# telemetry
Bootstraps the full OpenTelemetry SDK (traces, metrics, logs) with OTLP gRPC exporters targeting Grafana Alloy.
## Purpose
Sets the three OTel global providers so that all micro-libs using the OTel global API auto-instrument without any code changes. Returns a shutdown function that flushes all exporters on process exit. This module is the single place in an application where the OTel SDK is wired up.
## Tier & Dependencies
**Tier 5** (application bootstrap only). Must never be imported by framework libraries (Tier 04).
Depends on:
- `go.opentelemetry.io/otel` and sub-packages — API and SDK
- `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`
- `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`
- `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`
- `go.opentelemetry.io/otel/sdk/trace`, `.../metric`, `.../log`
No micro-lib dependencies. No `launcher` dependency — telemetry has no Component lifecycle.
## Key Design Decisions
- **Tier 5 / app-only** (ADR-001): Libraries use only the OTel API (no-op default). This module activates the real SDK. Importing it from a library is a mistake.
- **Three-signal OTLP bootstrap** (ADR-002): `New(ctx, cfg)` sets up traces → Tempo, metrics → Mimir, logs → Loki, all over a single OTLP gRPC endpoint. W3C TraceContext + Baggage propagation is set globally.
- **Global provider strategy** (ADR-003): Libraries call `otel.Tracer(...)` / `otel.Meter(...)` / `global.Logger(...)`. After `telemetry.New`, those calls route to the real SDK with no library changes required.
- **No `launcher.Component`**: Telemetry is not a lifecycle component. The caller defers the returned shutdown function directly in `main`. This keeps the module dependency graph minimal and the interface simple.
- **Sequential error rollback**: If any exporter fails to initialize, all previously created providers are shut down before the error is returned. The process never runs with a partial telemetry state.
## Patterns
**Standard application usage:**
```go
func main() {
ctx := context.Background()
shutdown, err := telemetry.New(ctx, telemetry.Config{
ServiceName: "order-service",
ServiceVersion: "1.4.2",
Environment: "production",
OTLPEndpoint: "alloy:4317",
OTLPInsecure: false,
})
if err != nil {
log.Fatalf("telemetry: %v", err)
}
defer shutdown(ctx)
// Rest of application wiring...
}
```
**With launcher (wire shutdown into lifecycle):**
```go
shutdown, err := telemetry.New(ctx, cfg)
if err != nil {
return err
}
lc.BeforeStop(func() error { return shutdown(ctx) })
```
**Config env vars:**
| Variable | Required | Default | Description |
|---|---|---|---|
| `OTEL_SERVICE_NAME` | yes | — | Service name in all signals |
| `OTEL_SERVICE_VERSION` | no | `unknown` | Deployed version |
| `OTEL_ENVIRONMENT` | no | `development` | Deployment environment |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | yes | — | OTLP gRPC collector address (e.g. `alloy:4317`) |
| `OTEL_EXPORTER_OTLP_INSECURE` | no | `false` | Disable TLS (set `true` for local dev) |
## What to Avoid
- Do not import this module from any non-`main` package. Libraries must use only OTel API packages.
- Do not call `telemetry.New` more than once per process. Each call overwrites the global providers.
- Do not omit the `defer shutdown(ctx)`. Without it, buffered spans and metrics are lost on exit.
- Do not use a zero-value `Config`. Both `ServiceName` and `OTLPEndpoint` are required; `New` will return an error if the OTLP connection cannot be established.
- Do not wrap this in a `launcher.Component`. The shutdown function pattern is simpler and avoids adding a `launcher` dependency to this module.
## Testing Notes
- The test file (`telemetry_test.go`) uses a `fakeCollector` that opens a TCP listener but speaks no gRPC protocol. This is sufficient to test that `New` succeeds and returns a callable shutdown function — the fake server accepts connections so the gRPC dial does not get connection-refused.
- Tests that verify global provider replacement (`TestNew_SetsGlobalTracerProvider`, `TestNew_SetsGlobalMeterProvider`) must call `shutdown` in a `t.Cleanup` to restore state for subsequent tests. The short shutdown timeout (200ms) is intentional — the fake server cannot complete a gRPC flush, so errors from `shutdown(ctx)` are expected and ignored.
- `newResource` is tested separately (`TestNewResource_Fields`, `TestNewResource_MergesWithDefault`) as a pure function with no I/O.
- Do not test against a real Alloy or Tempo instance in unit tests. Use the fake collector pattern.