telemetry has zero micro-lib dependencies — only the external OTel SDK. Tier 1 reflects its actual position in the dependency graph. The Tier 5 label was misleading about push/tag ordering; telemetry can be released independently of all other micro-lib modules.
87 lines
4.6 KiB
Markdown
87 lines
4.6 KiB
Markdown
# telemetry
|
|
|
|
Bootstraps the full OpenTelemetry SDK (traces, metrics, logs) with OTLP gRPC exporters targeting Grafana Alloy.
|
|
|
|
## Purpose
|
|
|
|
Sets the three OTel global providers so that all micro-libs using the OTel global API auto-instrument without any code changes. Returns a shutdown function that flushes all exporters on process exit. This module is the single place in an application where the OTel SDK is wired up.
|
|
|
|
## Tier & Dependencies
|
|
|
|
**Tier 1** (no micro-lib dependencies; external OTel SDK only). Must never be imported by framework libraries.
|
|
|
|
Depends on:
|
|
- `go.opentelemetry.io/otel` and sub-packages — API and SDK
|
|
- `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`
|
|
- `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`
|
|
- `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`
|
|
- `go.opentelemetry.io/otel/sdk/trace`, `.../metric`, `.../log`
|
|
|
|
No micro-lib dependencies. No `launcher` dependency — telemetry has no Component lifecycle.
|
|
|
|
## Key Design Decisions
|
|
|
|
- **Tier 1 / app-only** (ADR-001): Libraries use only the OTel API (no-op default). This module activates the real SDK. Importing it from a library is a mistake.
|
|
- **Three-signal OTLP bootstrap** (ADR-002): `New(ctx, cfg)` sets up traces → Tempo, metrics → Mimir, logs → Loki, all over a single OTLP gRPC endpoint. W3C TraceContext + Baggage propagation is set globally.
|
|
- **Global provider strategy** (ADR-003): Libraries call `otel.Tracer(...)` / `otel.Meter(...)` / `global.Logger(...)`. After `telemetry.New`, those calls route to the real SDK with no library changes required.
|
|
- **No `launcher.Component`**: Telemetry is not a lifecycle component. The caller defers the returned shutdown function directly in `main`. This keeps the module dependency graph minimal and the interface simple.
|
|
- **Sequential error rollback**: If any exporter fails to initialize, all previously created providers are shut down before the error is returned. The process never runs with a partial telemetry state.
|
|
|
|
## Patterns
|
|
|
|
**Standard application usage:**
|
|
|
|
```go
|
|
func main() {
|
|
ctx := context.Background()
|
|
shutdown, err := telemetry.New(ctx, telemetry.Config{
|
|
ServiceName: "order-service",
|
|
ServiceVersion: "1.4.2",
|
|
Environment: "production",
|
|
OTLPEndpoint: "alloy:4317",
|
|
OTLPInsecure: false,
|
|
})
|
|
if err != nil {
|
|
log.Fatalf("telemetry: %v", err)
|
|
}
|
|
defer shutdown(ctx)
|
|
|
|
// Rest of application wiring...
|
|
}
|
|
```
|
|
|
|
**With launcher (wire shutdown into lifecycle):**
|
|
|
|
```go
|
|
shutdown, err := telemetry.New(ctx, cfg)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
lc.BeforeStop(func() error { return shutdown(ctx) })
|
|
```
|
|
|
|
**Config env vars:**
|
|
|
|
| Variable | Required | Default | Description |
|
|
|---|---|---|---|
|
|
| `OTEL_SERVICE_NAME` | yes | — | Service name in all signals |
|
|
| `OTEL_SERVICE_VERSION` | no | `unknown` | Deployed version |
|
|
| `OTEL_ENVIRONMENT` | no | `development` | Deployment environment |
|
|
| `OTEL_EXPORTER_OTLP_ENDPOINT` | yes | — | OTLP gRPC collector address (e.g. `alloy:4317`) |
|
|
| `OTEL_EXPORTER_OTLP_INSECURE` | no | `false` | Disable TLS (set `true` for local dev) |
|
|
|
|
## What to Avoid
|
|
|
|
- Do not import this module from any non-`main` package. Libraries must use only OTel API packages.
|
|
- Do not call `telemetry.New` more than once per process. Each call overwrites the global providers.
|
|
- Do not omit the `defer shutdown(ctx)`. Without it, buffered spans and metrics are lost on exit.
|
|
- Do not use a zero-value `Config`. Both `ServiceName` and `OTLPEndpoint` are required; `New` will return an error if the OTLP connection cannot be established.
|
|
- Do not wrap this in a `launcher.Component`. The shutdown function pattern is simpler and avoids adding a `launcher` dependency to this module.
|
|
|
|
## Testing Notes
|
|
|
|
- The test file (`telemetry_test.go`) uses a `fakeCollector` that opens a TCP listener but speaks no gRPC protocol. This is sufficient to test that `New` succeeds and returns a callable shutdown function — the fake server accepts connections so the gRPC dial does not get connection-refused.
|
|
- Tests that verify global provider replacement (`TestNew_SetsGlobalTracerProvider`, `TestNew_SetsGlobalMeterProvider`) must call `shutdown` in a `t.Cleanup` to restore state for subsequent tests. The short shutdown timeout (200ms) is intentional — the fake server cannot complete a gRPC flush, so errors from `shutdown(ctx)` are expected and ignored.
|
|
- `newResource` is tested separately (`TestNewResource_Fields`, `TestNewResource_MergesWithDefault`) as a pure function with no I/O.
|
|
- Do not test against a real Alloy or Tempo instance in unit tests. Use the fake collector pattern.
|