Overview

Runtime Optimization

BitRouter optimizes your LLM runtime across cost, performance, and security — all from a single open-source Rust proxy running locally with zero infrastructure dependencies. Advanced features are available on BitRouter Cloud (coming soon).

Cost Optimization

Intelligent Routing

  • Priority-based provider selection and round-robin load balancing
  • Direct provider routing via provider:model_id syntax
  • Automatic failover on 5xx errors or timeouts — transparent to the agent

Spend Tracking & Budgets

  • Per-request cost calculation with granular token breakdown (cache reads, cache writes, reasoning tokens)
  • Persistent spend logs per account, model, and provider
  • Budget controls via JWT claims — per-session or per-account limits in micro-USD with time or round-based windows
  • Model allowlists restrict which models a token can access (glob patterns)

Performance Optimization

Async Rust Runtime

  • Tokio-based async I/O with sub-10ms routing overhead
  • Zero infrastructure dependencies — no Postgres, Redis, or Docker
  • First-class streaming with granular stream events (TextDelta, ToolCall, Finish)
  • Shared connection pooling across providers

Observability & Metrics

BitRouter tracks every request that flows through the proxy. All observability flows through the ObserveCallback trait, which fires after every request with full context.

Each request event includes:

FieldDescription
routeThe virtual model name (e.g. default, fast)
providerThe upstream provider that handled the request
modelThe upstream model ID
account_idThe caller's account (from JWT)
latency_msEnd-to-end request latency
input_tokensInput token count
output_tokensOutput token count
successWhether the request succeeded
error_typeError category on failure (e.g. Transport, Provider, AccessDenied)

BitRouter collects in-memory per-route metrics with per-endpoint breakdowns:

  • Request count — total requests and errors per route and endpoint
  • Error rate — errors / total requests
  • Latency percentiles — p50 and p99 (bounded sampling, up to 10k samples per route)
  • Average token usage — avg input and output tokens (non-streaming requests only)
  • Last used — timestamp of the most recent request

Query metrics via the admin API:

curl http://localhost:8787/admin/metrics

Metrics are held in memory only and reset on process restart.

Spend Tracking

BitRouter calculates cost for every request using the provider's token pricing and persists spend logs to the configured database.

Costs are calculated per-request using granular token pricing (per million tokens):

Token categoryDescription
input_tokens.no_cacheNon-cached input tokens
input_tokens.cache_readCache-read input tokens
input_tokens.cache_writeCache-write input tokens
output_tokens.textText output tokens
output_tokens.reasoningReasoning output tokens

Spend logs are stored in the configured database (sqlite://, postgres://, or mysql://). The table is created automatically via migrations.

For streaming requests, only the time-to-stream-start latency and request/error counts are recorded in metrics. Token usage is not available until the stream completes.

Security Optimization

Runtime Guardrails

BitRouter includes a proxy-layer firewall that inspects content flowing between agents and LLM providers. It wraps the model router transparently — your agents and providers don't need any changes.

Guardrails are enabled by default. Traffic is inspected in two directions:

  • Upgoing — outbound traffic from your agent to the LLM provider (user → model)
  • Downgoing — inbound traffic from the LLM provider back to your agent (model → user)

Built-in Patterns

Pattern IDDirectionDetects
api_keysupgoingAPI keys from OpenAI, Anthropic, AWS, GCP, GitHub, Stripe
private_keysupgoingPEM-encoded private keys (RSA, EC, Ed25519, etc.)
credentialsupgoingInline passwords, basic-auth headers, database connection strings
pii_emailsupgoingEmail addresses
pii_phone_numbersupgoingPhone numbers
ip_addressesupgoingIPv4 addresses (non-localhost)
suspicious_commandsdowngoingDangerous shell commands (rm -rf /, curl | sh, fork bombs, etc.)

Actions

Each pattern can be assigned one of three actions:

ActionBehavior
warnLog a warning but allow the content through (default)
redactReplace matched content before forwarding
blockReject the entire request with an error

Configuration

guardrails:
  enabled: true

  disabled_patterns:
    - ip_addresses

  upgoing:
    api_keys: redact
    private_keys: block
    credentials: block
    pii_emails: warn

  downgoing:
    suspicious_commands: block

  custom_patterns:
    - name: internal_token
      regex: "myapp_[A-Za-z0-9]{32}"
      direction: upgoing    # upgoing | downgoing | both

  custom_upgoing:
    internal_token: redact

  block_message:
    include_details: true
    include_help_link: true

Authentication & Agent Identity

  • Web3 JWT auth with Solana (Ed25519) and EVM (secp256k1) verification — zero server-side state
  • Scoped tokens (admin vs API), model allowlists, budget claims
  • ACP Agent Cards bound to CAIP-10 wallet addresses with discovery endpoints

Advanced runtime optimization features — including analytics dashboards, team-level policy controls, and hosted guardrail management — are coming soon on BitRouter Cloud.

For full source code, visit the BitRouter GitHub repo.

How is this guide?

Last updated on

On this page