Features

OpenTelemetry

BitRouter is OpenTelemetry-native — traces and metrics for every request, exported over OTLP to any backend you run. Everything here is open-source and runs on your own infrastructure.

BitRouter is OpenTelemetry-native. Every request you send through the router becomes a trace — the full lifecycle from ingress through routing, each upstream attempt (including failovers), and settlement — plus a set of metrics, all following the OpenTelemetry GenAI semantic conventions and pushed over OTLP to any backend you already run.

Everything on this page is open-source and runs entirely on your own infrastructure — there's no BitRouter telemetry endpoint in the middle. It's off until you point it somewhere, and it excludes message content by default. If you'd rather not run a collector, BitRouter Cloud gives you a hosted request view with nothing to operate.

How a trace looks

Each request produces a span tree:

HTTP SERVER  POST /v1/chat/completions        (ingress)
└─ chat      (INTERNAL, inbound — whole request lifetime)
   ├─ route  (INTERNAL — routing decision)
   ├─ chat   (CLIENT — upstream attempt #1, gen_ai.* attributes)
   ├─ chat   (CLIENT — failover attempt #2)
   └─ settle (INTERNAL — settlement summary)

There is one GenAI generation per request — the inbound chat span. Each upstream attempt is a separate CLIENT span, so a failover chain shows every provider it tried, in order, with the latency and outcome of each hop. BitRouter extracts inbound W3C trace context and injects an outbound traceparent, so router spans stitch into the parent trace from your agent or gateway. Because each attempt is its own span, traces are where failover and routing behavior become legible: which provider was tried first, why it fell through, and where latency went across the chain.

Span attributes

AttributeDescription
gen_ai.provider.nameUpstream provider for the hop (e.g. openai, anthropic)
gen_ai.response.modelModel that actually served the response
gen_ai.token.typeinput / output, on token measurements
outcomeFinal disposition of the request
api_key_id, user_idCaller attribution (cardinality-capped)
account_labelLogical account/tenant label

Enable export

Add an otel block under the bitrouter-observe plugin and give it an endpoint. That alone turns export on:

plugins:
  bitrouter-observe:
    otel:
      endpoint: "http://localhost:4318"   # your OTLP endpoint
      service_name: "bitrouter"

Keep secrets out of the committed file — use ${VAR} references for any auth headers, resolved from the environment at load time:

plugins:
  bitrouter-observe:
    otel:
      endpoint: "https://api.honeycomb.io"
      headers:
        x-honeycomb-team: "${HONEYCOMB_API_KEY}"

Every field has an environment-variable override, so you can configure export without touching the file — useful in containers, where you can run with no otel block at all:

Env varSets
OTEL_EXPORTER_OTLP_ENDPOINTOTLP endpoint URL
OTEL_EXPORTER_OTLP_HEADERSAuth headers (comma-separated k=v)
OTEL_SERVICE_NAMEResource service name
OTEL_RESOURCE_ATTRIBUTESExtra resource attributes (comma-separated k=v)
OTEL_TRACES_SAMPLERSampler kind
OTEL_TRACES_SAMPLER_ARGSampler argument (e.g. ratio)
BITROUTER_OBSERVE_CONTENT_CAPTUREContent capture mode

The OTLP transport is selected when the binary is built: otel-http (OTLP/HTTP + protobuf, the default) or otel-grpc (OTLP/gRPC). The configuration on this page is identical for both — you only care about the transport if your backend speaks one and not the other.

Metrics

Alongside traces, metrics are exported over OTLP on an interval (default 60s):

MetricTypeMeasures
bitrouter.requestsCounterRequests processed
gen_ai.client.operation.durationHistogramRequest latency
gen_ai.client.token.usageHistogramToken counts (by gen_ai.token.type)
bitrouter.errorsCounterErrors
bitrouter.stream_partsCounterStreaming parts emitted

Dimensions include gen_ai.provider.name, gen_ai.response.model, outcome, account_label, and the caller identifiers. To keep metric cardinality bounded on shared deployments, api_key_id and user_id are capped (defaults: 1024 and 256 distinct values); beyond the cap, values collapse to an overflow bucket.

There is no Prometheus scrape endpoint. GET /metrics is retired — metrics are pushed via OTLP only. If your stack is Prometheus-based, ingest through an OpenTelemetry Collector with the Prometheus exporter.

Backend recipes

Each block is the plugins.bitrouter-observe.otel config for a common backend.

OpenTelemetry Collector

Send everything to a local or in-cluster Collector and let it fan out to your real backends (this is also the path to a Prometheus-based stack — the Collector's Prometheus exporter bridges the gap, since BitRouter has no scrape endpoint):

otel:
  endpoint: "http://otel-collector:4318"
  service_name: "bitrouter"
  resource_attributes:
    deployment.environment: "prod"

Honeycomb

otel:
  endpoint: "https://api.honeycomb.io"
  service_name: "bitrouter"
  headers:
    x-honeycomb-team: "${HONEYCOMB_API_KEY}"

Grafana Cloud / Tempo

Grafana Cloud's OTLP gateway uses basic auth (instance ID + API token, base64 encoded). For self-hosted Tempo, point at its OTLP port and drop the header.

otel:
  endpoint: "https://otlp-gateway-<region>.grafana.net/otlp"
  service_name: "bitrouter"
  headers:
    Authorization: "Basic ${GRAFANA_OTLP_TOKEN}"

Datadog

Datadog ingests OTLP through the Datadog Agent rather than a public OTLP URL — run the Agent with OTLP receiving enabled and point BitRouter at it:

otel:
  endpoint: "http://datadog-agent:4318"
  service_name: "bitrouter"
  resource_attributes:
    deployment.environment: "prod"

Tune sampling

By default BitRouter respects the inbound trace decision and otherwise samples everything (parentbased_always_on). On high throughput, sample a fraction instead:

otel:
  endpoint: "http://otel-collector:4318"
  sampler: "parentbased_traceidratio"
  sampler_arg: 0.1                       # keep 10% of root traces
samplerBehavior
always_onSample every trace
always_offSample nothing
traceidratioSample a fraction (sampler_arg), ignoring parent
parentbased_always_onFollow parent; sample if no parent (default)
parentbased_always_offFollow parent; drop if no parent
parentbased_traceidratioFollow parent; otherwise sample sampler_arg

parentbased_* variants honor the upstream decision, so a trace your agent started won't be half-sampled at the router. The metrics export interval and the trace batch queue are tunable separately under metrics and traces.batch if you need to trade freshness for overhead.

Content capture

Prompt and response content is excluded by default (content_capture: off). Turn it on only when you need prompt and response bodies on the spans for debugging:

otel:
  content_capture: "full"   # off (default) | full

full writes user prompts and model responses into your telemetry backend. That content then inherits the backend's access controls and retention. For shared or regulated environments, leave it off and capture content only in a scoped, short-lived debugging session.

Verify

Reload (or restart) the router, then ask the running daemon what it's doing:

bitrouter reload                  # pick up config changes without dropping connections
bitrouter observe status          # endpoint, sampler, cardinality, in-flight spans
bitrouter observe status --json

If it reports stopped, the exporter isn't wired — check that the otel block has an endpoint (or that OTEL_EXPORTER_OTLP_ENDPOINT is set) and that the binary was built with an OTLP transport feature. Then send a request through the router and confirm the trace lands in your backend; you should see one inbound chat span per request with a CLIENT child for each upstream attempt.

Next steps

How is this guide?

On this page