OpenTelemetry
BitRouter is OpenTelemetry-native — traces and metrics for every request, exported over OTLP to any backend you run. Everything here is open-source and runs on your own infrastructure.
BitRouter is OpenTelemetry-native. Every request you send through the router becomes a trace — the full lifecycle from ingress through routing, each upstream attempt (including failovers), and settlement — plus a set of metrics, all following the OpenTelemetry GenAI semantic conventions and pushed over OTLP to any backend you already run.
Everything on this page is open-source and runs entirely on your own infrastructure — there's no BitRouter telemetry endpoint in the middle. It's off until you point it somewhere, and it excludes message content by default. If you'd rather not run a collector, BitRouter Cloud gives you a hosted request view with nothing to operate.
How a trace looks
Each request produces a span tree:
HTTP SERVER POST /v1/chat/completions (ingress)
└─ chat (INTERNAL, inbound — whole request lifetime)
├─ route (INTERNAL — routing decision)
├─ chat (CLIENT — upstream attempt #1, gen_ai.* attributes)
├─ chat (CLIENT — failover attempt #2)
└─ settle (INTERNAL — settlement summary)There is one GenAI generation per request — the inbound chat span. Each
upstream attempt is a separate CLIENT span, so a failover chain shows every
provider it tried, in order, with the latency and outcome of each hop. BitRouter
extracts inbound W3C trace context and injects an outbound traceparent, so
router spans stitch into the parent trace from your agent or gateway. Because
each attempt is its own span, traces are where failover and routing behavior
become legible: which provider was tried first, why it fell through, and where
latency went across the chain.
Span attributes
| Attribute | Description |
|---|---|
gen_ai.provider.name | Upstream provider for the hop (e.g. openai, anthropic) |
gen_ai.response.model | Model that actually served the response |
gen_ai.token.type | input / output, on token measurements |
outcome | Final disposition of the request |
api_key_id, user_id | Caller attribution (cardinality-capped) |
account_label | Logical account/tenant label |
Enable export
Add an otel block under the bitrouter-observe plugin and give it an endpoint.
That alone turns export on:
plugins:
bitrouter-observe:
otel:
endpoint: "http://localhost:4318" # your OTLP endpoint
service_name: "bitrouter"Keep secrets out of the committed file — use ${VAR} references for any auth
headers, resolved from the environment at load time:
plugins:
bitrouter-observe:
otel:
endpoint: "https://api.honeycomb.io"
headers:
x-honeycomb-team: "${HONEYCOMB_API_KEY}"Every field has an environment-variable override, so you can configure export
without touching the file — useful in containers, where you can run with no
otel block at all:
| Env var | Sets |
|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | OTLP endpoint URL |
OTEL_EXPORTER_OTLP_HEADERS | Auth headers (comma-separated k=v) |
OTEL_SERVICE_NAME | Resource service name |
OTEL_RESOURCE_ATTRIBUTES | Extra resource attributes (comma-separated k=v) |
OTEL_TRACES_SAMPLER | Sampler kind |
OTEL_TRACES_SAMPLER_ARG | Sampler argument (e.g. ratio) |
BITROUTER_OBSERVE_CONTENT_CAPTURE | Content capture mode |
The OTLP transport is selected when the binary is built: otel-http
(OTLP/HTTP + protobuf, the default) or otel-grpc (OTLP/gRPC). The configuration
on this page is identical for both — you only care about the transport if your
backend speaks one and not the other.
Metrics
Alongside traces, metrics are exported over OTLP on an interval (default 60s):
| Metric | Type | Measures |
|---|---|---|
bitrouter.requests | Counter | Requests processed |
gen_ai.client.operation.duration | Histogram | Request latency |
gen_ai.client.token.usage | Histogram | Token counts (by gen_ai.token.type) |
bitrouter.errors | Counter | Errors |
bitrouter.stream_parts | Counter | Streaming parts emitted |
Dimensions include gen_ai.provider.name, gen_ai.response.model, outcome,
account_label, and the caller identifiers. To keep metric cardinality bounded
on shared deployments, api_key_id and user_id are capped (defaults: 1024 and
256 distinct values); beyond the cap, values collapse to an overflow bucket.
There is no Prometheus scrape endpoint. GET /metrics is retired — metrics
are pushed via OTLP only. If your stack is Prometheus-based, ingest through an
OpenTelemetry Collector with the Prometheus exporter.
Backend recipes
Each block is the plugins.bitrouter-observe.otel config for a common backend.
OpenTelemetry Collector
Send everything to a local or in-cluster Collector and let it fan out to your real backends (this is also the path to a Prometheus-based stack — the Collector's Prometheus exporter bridges the gap, since BitRouter has no scrape endpoint):
otel:
endpoint: "http://otel-collector:4318"
service_name: "bitrouter"
resource_attributes:
deployment.environment: "prod"Honeycomb
otel:
endpoint: "https://api.honeycomb.io"
service_name: "bitrouter"
headers:
x-honeycomb-team: "${HONEYCOMB_API_KEY}"Grafana Cloud / Tempo
Grafana Cloud's OTLP gateway uses basic auth (instance ID + API token, base64 encoded). For self-hosted Tempo, point at its OTLP port and drop the header.
otel:
endpoint: "https://otlp-gateway-<region>.grafana.net/otlp"
service_name: "bitrouter"
headers:
Authorization: "Basic ${GRAFANA_OTLP_TOKEN}"Datadog
Datadog ingests OTLP through the Datadog Agent rather than a public OTLP URL — run the Agent with OTLP receiving enabled and point BitRouter at it:
otel:
endpoint: "http://datadog-agent:4318"
service_name: "bitrouter"
resource_attributes:
deployment.environment: "prod"Tune sampling
By default BitRouter respects the inbound trace decision and otherwise samples
everything (parentbased_always_on). On high throughput, sample a fraction
instead:
otel:
endpoint: "http://otel-collector:4318"
sampler: "parentbased_traceidratio"
sampler_arg: 0.1 # keep 10% of root tracessampler | Behavior |
|---|---|
always_on | Sample every trace |
always_off | Sample nothing |
traceidratio | Sample a fraction (sampler_arg), ignoring parent |
parentbased_always_on | Follow parent; sample if no parent (default) |
parentbased_always_off | Follow parent; drop if no parent |
parentbased_traceidratio | Follow parent; otherwise sample sampler_arg |
parentbased_* variants honor the upstream decision, so a trace your agent
started won't be half-sampled at the router. The metrics export interval and the
trace batch queue are tunable separately under metrics and traces.batch if
you need to trade freshness for overhead.
Content capture
Prompt and response content is excluded by default (content_capture: off).
Turn it on only when you need prompt and response bodies on the spans for
debugging:
otel:
content_capture: "full" # off (default) | fullfull writes user prompts and model responses into your telemetry backend.
That content then inherits the backend's access controls and retention. For
shared or regulated environments, leave it off and capture content only in a
scoped, short-lived debugging session.
Verify
Reload (or restart) the router, then ask the running daemon what it's doing:
bitrouter reload # pick up config changes without dropping connections
bitrouter observe status # endpoint, sampler, cardinality, in-flight spans
bitrouter observe status --jsonIf it reports stopped, the exporter isn't wired — check that the otel block
has an endpoint (or that OTEL_EXPORTER_OTLP_ENDPOINT is set) and that the
binary was built with an OTLP transport feature. Then send a request through the
router and confirm the trace lands in your backend; you should see one inbound
chat span per request with a CLIENT child for each upstream attempt.
Next steps
How is this guide?