BitRouter optimizes your LLM runtime across cost, performance, and security — all from a single open-source Rust proxy running locally with zero infrastructure dependencies. Advanced features are available on BitRouter Cloud (coming soon).

Cost Optimization

Intelligent Routing

Priority-based provider selection and round-robin load balancing
Direct provider routing via provider:model_id syntax
Automatic failover on 5xx errors or timeouts — transparent to the agent

Spend Tracking & Budgets

Per-request cost calculation with granular token breakdown (cache reads, cache writes, reasoning tokens)
Persistent spend logs per account, model, and provider
Budget controls via JWT claims — per-session or per-account limits in micro-USD with time or round-based windows
Model allowlists restrict which models a token can access (glob patterns)

Performance Optimization

Async Rust Runtime

Tokio-based async I/O with sub-10ms routing overhead
Zero infrastructure dependencies — no Postgres, Redis, or Docker
First-class streaming with granular stream events (TextDelta, ToolCall, Finish)
Shared connection pooling across providers

Observability & Metrics

BitRouter tracks every request that flows through the proxy. All observability flows through the ObserveCallback trait, which fires after every request with full context.

Each request event includes:

Field	Description
`route`	The virtual model name (e.g. `default`, `fast`)
`provider`	The upstream provider that handled the request
`model`	The upstream model ID
`account_id`	The caller's account (from JWT)
`latency_ms`	End-to-end request latency
`input_tokens`	Input token count
`output_tokens`	Output token count
`success`	Whether the request succeeded
`error_type`	Error category on failure (e.g. `Transport`, `Provider`, `AccessDenied`)

BitRouter collects in-memory per-route metrics with per-endpoint breakdowns:

Request count — total requests and errors per route and endpoint
Error rate — errors / total requests
Latency percentiles — p50 and p99 (bounded sampling, up to 10k samples per route)
Average token usage — avg input and output tokens (non-streaming requests only)
Last used — timestamp of the most recent request

Query metrics via the admin API:

curl http://localhost:8787/admin/metrics

Metrics are held in memory only and reset on process restart.

Spend Tracking

BitRouter calculates cost for every request using the provider's token pricing and persists spend logs to the configured database.

Costs are calculated per-request using granular token pricing (per million tokens):

Token category	Description
`input_tokens.no_cache`	Non-cached input tokens
`input_tokens.cache_read`	Cache-read input tokens
`input_tokens.cache_write`	Cache-write input tokens
`output_tokens.text`	Text output tokens
`output_tokens.reasoning`	Reasoning output tokens

Spend logs are stored in the configured database (sqlite://, postgres://, or mysql://). The table is created automatically via migrations.

For streaming requests, only the time-to-stream-start latency and request/error counts are recorded in metrics. Token usage is not available until the stream completes.

Security Optimization

Runtime Guardrails

BitRouter includes a proxy-layer firewall that inspects content flowing between agents and LLM providers. It wraps the model router transparently — your agents and providers don't need any changes.

Guardrails are enabled by default. Traffic is inspected in two directions:

Upgoing — outbound traffic from your agent to the LLM provider (user → model)
Downgoing — inbound traffic from the LLM provider back to your agent (model → user)

Built-in Patterns

Pattern ID	Direction	Detects
`api_keys`	upgoing	API keys from OpenAI, Anthropic, AWS, GCP, GitHub, Stripe
`private_keys`	upgoing	PEM-encoded private keys (RSA, EC, Ed25519, etc.)
`credentials`	upgoing	Inline passwords, basic-auth headers, database connection strings
`pii_emails`	upgoing	Email addresses
`pii_phone_numbers`	upgoing	Phone numbers
`ip_addresses`	upgoing	IPv4 addresses (non-localhost)
`suspicious_commands`	downgoing	Dangerous shell commands (`rm -rf /`, `curl \| sh`, fork bombs, etc.)

Actions

Each pattern can be assigned one of three actions:

Action	Behavior
`warn`	Log a warning but allow the content through (default)
`redact`	Replace matched content before forwarding
`block`	Reject the entire request with an error

Configuration

guardrails:
  enabled: true

  disabled_patterns:
    - ip_addresses

  upgoing:
    api_keys: redact
    private_keys: block
    credentials: block
    pii_emails: warn

  downgoing:
    suspicious_commands: block

  custom_patterns:
    - name: internal_token
      regex: "myapp_[A-Za-z0-9]{32}"
      direction: upgoing    # upgoing | downgoing | both

  custom_upgoing:
    internal_token: redact

  block_message:
    include_details: true
    include_help_link: true

Authentication & Agent Identity

Web3 JWT auth with Solana (Ed25519) and EVM (secp256k1) verification — zero server-side state
Scoped tokens (admin vs API), model allowlists, budget claims
ACP Agent Cards bound to CAIP-10 wallet addresses with discovery endpoints

Advanced runtime optimization features — including analytics dashboards, team-level policy controls, and hosted guardrail management — are coming soon on BitRouter Cloud.

For full source code, visit the BitRouter GitHub repo.

Runtime Optimization