Question 1

What is an AI model router?

Accepted Answer

An AI model router is a unified API layer that sits between your AI agent and the upstream LLM providers. Instead of hardcoding a single provider into your application, you point every model call at the router and it intelligently selects the best available model based on cost, latency, capability, and provider health. BitRouter goes further than a simple proxy: it handles failover, per-run observability, prompt-injection guardrails, and task-complexity-based model matching — all without any changes to your agent code.

Question 2

How is BitRouter different from OpenRouter?

Accepted Answer

OpenRouter is a closed-source hosted gateway — no self-host option, no agent-native primitives, no permissionless registry. BitRouter is Apache 2.0: fork the binary and run it anywhere, or use the hosted edge if you don't want to operate it. The provider registry is fully open — anyone can publish a provider via pull request with no review queue or approval process. The result is no lock-in at any layer — swap models, switch agent harnesses, or self-host the router itself — plus router-level guardrails, per-run cost attribution, MCP/ACP/Skills gateway support, and intent-aware routing that OpenRouter does not offer.

Question 3

How is BitRouter different from LiteLLM?

Accepted Answer

LiteLLM is an open-source Python library you embed inside your application code. BitRouter is a standalone binary that runs as a sidecar or hosted edge — you drop it in front of any runtime (Claude Code, Cursor, Codex, your own agent) without modifying each service. It comes with auth, billing, observability, guardrails, and an MCP/ACP/Skills gateway built in. You configure policy once at the router rather than repeating safety and routing logic in every service that calls an LLM.

Question 4

Which AI models does BitRouter support?

Accepted Answer

BitRouter's cost advantage comes from open models: the open provider registry carries Qwen 3.7, DeepSeek V4 Pro, Kimi K2.6, GLM 5.1, MiniMax M3, StepFun 3.7, and Mimo V2.5 Pro, and routes the routine majority of an agent's calls to them at a fraction of frontier prices — any provider hosting a model can publish a listing and receive traffic immediately. Frontier models stay one alias away for the calls that need them: Claude Fable 5 / Claude Opus 4.8 (Anthropic), GPT-5 and o3 (OpenAI), Gemini 3.1 Pro and 3.5 Flash (Google), Grok 4.3 (xAI). The model list updates automatically as providers publish new entries; no binary upgrade or alias change is needed on your end.

Question 5

How do I self-host BitRouter?

Accepted Answer

Pull the Apache 2.0 binary from github.com/bitrouter/bitrouter — it is a single binary with no daemon, no GUI, and no infrastructure dependencies beyond a network connection. It drops into any container, CI step, or bare VM. Self-hosted BitRouter gives you the same routing engine, guardrails, MCP/ACP/Skills gateway, and observability as the hosted edge, without the platform fee. Your traffic never leaves your infrastructure.

Question 6

Does BitRouter work with Claude Code and other coding agents?

Accepted Answer

Yes — BitRouter works with any agent harness that supports a configurable base URL or API key. Claude Code, GitHub Copilot, Codex, Opencode, Pi Agent, Hermes, and Openclaw all connect with a two-variable override (ANTHROPIC_BASE_URL or OPENAI_BASE_URL) and zero code changes — routing, failover, cost tracking, and guardrails apply automatically from that point forward. The same pattern works for any harness not yet in the list. Step-by-step setup for each integration is in the cookbook at /docs/integrations.


GPT-5.6 Sol	Terminus 2	85.77%	$1.02	373s
Claude Fable 5	Terminus 2	80.52%	$1.43	505s
GPT-5.5	Terminus 2	76.40%	$0.74	427s
▸GPT-5.5 + BitRouter	bitrouter · adaptive	75.26%‡	$0.50‡	—
Claude Sonnet 5	Terminus 2	74.53%	$0.80	635s
Claude Opus 4.8	Terminus 2	71.91%	$2.41	930s

request	cx	routed	cost	decision
fix auth.py test	0.18	qwen/qwen-3.7	$0.002	open
summarize thread	0.12	qwen/qwen-3.7	$0.002	open
design migration plan	0.62	gpt-5.5	$0.021	frontier
rank retrieval hits	0.30	deepseek-v4	$0.003	open

time	model	cost	lat	st
14:22:01	qwen/qwen-3.7	$0.002	82ms	ok
14:22:00	qwen/qwen-3.7	$0.002	91ms	ok
14:21:58	deepseek-v4	$0.003	101ms	ok
14:21:55	gpt-5.5	$0.021	140ms	ok

request	cx	bar	q	verdict
fix auth.py test	0.18	▓▓░░░░░░░░	0.91	open holds it
rank retrieval hits	0.30	▓▓▓░░░░░░░	0.88	open holds it
design migration plan	0.62	▓▓▓▓▓▓░░░░	0.94	escalate

Stop tokenmaxxing your agentic loops.

Cheap, fast, right — you don’t have to pick.

A 200-file refactor — ~2,400 model calls in one run.

Same tasks solved. A fraction of the bill.

Act. Observe. Evaluate. Update.

Route each call to the model that fits.

See every call, per run.

Score what the call actually needed.

Tune the policy from what it learned.

Questions before you ship.

Start routing in under a minute.