telemetry has zero micro-lib dependencies — only the external OTel SDK. Tier 1 reflects its actual position in the dependency graph. The Tier 5 label was misleading about push/tag ordering; telemetry can be released independently of all other micro-lib modules.
4.6 KiB
4.6 KiB
telemetry
Bootstraps the full OpenTelemetry SDK (traces, metrics, logs) with OTLP gRPC exporters targeting Grafana Alloy.
Purpose
Sets the three OTel global providers so that all micro-libs using the OTel global API auto-instrument without any code changes. Returns a shutdown function that flushes all exporters on process exit. This module is the single place in an application where the OTel SDK is wired up.
Tier & Dependencies
Tier 1 (no micro-lib dependencies; external OTel SDK only). Must never be imported by framework libraries.
Depends on:
go.opentelemetry.io/oteland sub-packages — API and SDKgo.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpcgo.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpcgo.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpcgo.opentelemetry.io/otel/sdk/trace,.../metric,.../log
No micro-lib dependencies. No launcher dependency — telemetry has no Component lifecycle.
Key Design Decisions
- Tier 1 / app-only (ADR-001): Libraries use only the OTel API (no-op default). This module activates the real SDK. Importing it from a library is a mistake.
- Three-signal OTLP bootstrap (ADR-002):
New(ctx, cfg)sets up traces → Tempo, metrics → Mimir, logs → Loki, all over a single OTLP gRPC endpoint. W3C TraceContext + Baggage propagation is set globally. - Global provider strategy (ADR-003): Libraries call
otel.Tracer(...)/otel.Meter(...)/global.Logger(...). Aftertelemetry.New, those calls route to the real SDK with no library changes required. - No
launcher.Component: Telemetry is not a lifecycle component. The caller defers the returned shutdown function directly inmain. This keeps the module dependency graph minimal and the interface simple. - Sequential error rollback: If any exporter fails to initialize, all previously created providers are shut down before the error is returned. The process never runs with a partial telemetry state.
Patterns
Standard application usage:
func main() {
ctx := context.Background()
shutdown, err := telemetry.New(ctx, telemetry.Config{
ServiceName: "order-service",
ServiceVersion: "1.4.2",
Environment: "production",
OTLPEndpoint: "alloy:4317",
OTLPInsecure: false,
})
if err != nil {
log.Fatalf("telemetry: %v", err)
}
defer shutdown(ctx)
// Rest of application wiring...
}
With launcher (wire shutdown into lifecycle):
shutdown, err := telemetry.New(ctx, cfg)
if err != nil {
return err
}
lc.BeforeStop(func() error { return shutdown(ctx) })
Config env vars:
| Variable | Required | Default | Description |
|---|---|---|---|
OTEL_SERVICE_NAME |
yes | — | Service name in all signals |
OTEL_SERVICE_VERSION |
no | unknown |
Deployed version |
OTEL_ENVIRONMENT |
no | development |
Deployment environment |
OTEL_EXPORTER_OTLP_ENDPOINT |
yes | — | OTLP gRPC collector address (e.g. alloy:4317) |
OTEL_EXPORTER_OTLP_INSECURE |
no | false |
Disable TLS (set true for local dev) |
What to Avoid
- Do not import this module from any non-
mainpackage. Libraries must use only OTel API packages. - Do not call
telemetry.Newmore than once per process. Each call overwrites the global providers. - Do not omit the
defer shutdown(ctx). Without it, buffered spans and metrics are lost on exit. - Do not use a zero-value
Config. BothServiceNameandOTLPEndpointare required;Newwill return an error if the OTLP connection cannot be established. - Do not wrap this in a
launcher.Component. The shutdown function pattern is simpler and avoids adding alauncherdependency to this module.
Testing Notes
- The test file (
telemetry_test.go) uses afakeCollectorthat opens a TCP listener but speaks no gRPC protocol. This is sufficient to test thatNewsucceeds and returns a callable shutdown function — the fake server accepts connections so the gRPC dial does not get connection-refused. - Tests that verify global provider replacement (
TestNew_SetsGlobalTracerProvider,TestNew_SetsGlobalMeterProvider) must callshutdownin at.Cleanupto restore state for subsequent tests. The short shutdown timeout (200ms) is intentional — the fake server cannot complete a gRPC flush, so errors fromshutdown(ctx)are expected and ignored. newResourceis tested separately (TestNewResource_Fields,TestNewResource_MergesWithDefault) as a pure function with no I/O.- Do not test against a real Alloy or Tempo instance in unit tests. Use the fake collector pattern.