Files
firebase/docs/adr/ADR-001-sdk-health-check-pattern.md

61 lines
2.4 KiB
Markdown
Raw Normal View History

# ADR-001: SDK Health Check via Known-Nonexistent UID Probe
**Status:** Accepted
**Date:** 2026-03-18
## Context
The firebase-admin-go SDK wraps gRPC and HTTP internally. There is no explicit "ping" or
"connection check" method on the `firebase.App` or `auth.Client` types. Health checking
requires making a real API call that exercises the actual SDK communication path.
Two naive approaches have drawbacks:
1. **Check for a known real user**: Requires a stable test user to exist in production
Firebase. This is a maintenance burden and a security concern.
2. **String-match on the error message**: Coupling to SDK error message text is fragile;
internal messages change across SDK versions without notice.
## Decision
The health check calls `authClient.GetUser(ctx, "health-probe-non-existent")` with a UID
that is guaranteed not to exist. The expected outcome is a "user not found" error. The SDK
provides a typed predicate for this:
```go
_, err = authClient.GetUser(ctx, "health-probe-non-existent")
if err != nil {
if auth.IsUserNotFound(err) {
return nil // expected: the probe succeeded, service is reachable
}
return err // unexpected error: network failure, auth error, etc.
}
return nil
```
`auth.IsUserNotFound(err)` is an official SDK helper that inspects the error's underlying
type, not its message string. It is stable across SDK versions.
A "not found" response proves:
- The Firebase project is reachable.
- Authentication (ADC or service account key) is valid.
- The Auth service responded correctly.
Any other error (permission denied, network timeout, invalid project) is treated as a health
failure and propagated to the health framework.
## Consequences
**Positive:**
- The probe exercises the actual authentication and network path.
- `auth.IsUserNotFound` is a stable, typed check that does not depend on error messages.
- No real user is needed; the probe UID can never collide with a real account.
- The component is marked `health.LevelCritical` — if the probe fails, the service is
considered unhealthy.
**Negative:**
- Every health check invocation makes a live API call to Firebase. Under high-frequency
health polling, this generates traffic. Health check intervals should be configured
conservatively (e.g. every 30 seconds, not every second).
- The probe UID `"health-probe-non-existent"` is hardcoded. It is not configurable.