Point BitRouter at your own local or private model server — Ollama, vLLM, LM Studio, llama.cpp, or any OpenAI-compatible endpoint. 100% free in local mode.

Run the open-source bitrouter binary against a model server on your own machine — Ollama, vLLM, LM Studio, llama.cpp, or any OpenAI-compatible endpoint — and the whole path stays on localhost. No cloud account, no per-token fee, no key leaving your box. This is local mode, and it's free.

Why route a local model through BitRouter

A bare local server gives you one model at one URL. Putting BitRouter in front of it buys you the same surface you get for hosted models:

One endpoint, one protocol. Your agent runtime points at http://localhost:4356 and speaks the OpenAI (or Anthropic) wire format. Whether a request lands on a local Llama or a hosted Claude is a routing decision, not a client change.
Fall back between local and hosted. Declare a virtual model that tries your local server first and spills over to a hosted provider on error or overload — cheap-and-local by default, resilient when the GPU is busy.
Apply the same guardrails and observability. Your Agent firewall rules and request tracing work on local inference exactly as they do on hosted calls — inspect, redact, or block before the prompt ever reaches the model, and see every local hop in the request log.

Two ways to register a local model

1. OpenAI-compatible endpoint via the config file

Most local servers (Ollama, vLLM, LM Studio, llama.cpp) expose an OpenAI-compatible /v1 API. Declare them as a provider in bitrouter.yaml with an api_base pointing at the local URL:

providers:
  ollama:
    api_base: http://localhost:11434/v1
    api_protocol:
      - "*": chat_completions
    models:
      - id: llama3.1

providers is a map keyed by the provider id you choose (ollama here). api_base is the server's base URL; api_protocol is the wire format BitRouter speaks upstream — chat_completions for any OpenAI-compatible server. BitRouter already infers chat_completions for any non-Anthropic, non-Google host, so the api_protocol block is optional, but stating it keeps the intent explicit. Each entry under models is the model id the server serves.

The full step-by-step — with Ollama, vLLM, LM Studio, and llama.cpp variants and the exact start commands — is in the local model servers recipe.

2. A registry-listed provider, auto-detected from the environment

If your private endpoint is a provider already listed in the registry, you don't need a config file at all. Set its key as BITROUTER_<PROVIDER_ID>_API_KEY and BitRouter detects it at startup — see BYOK for the full env-var convention. This is the path for a private hosted deployment that ships a registry manifest, rather than a raw localhost server.

Local servers usually need no API key. Ollama, vLLM, LM Studio, and llama.cpp accept anonymous requests on loopback by default, so you can leave api_key off the provider entry. Add one only if you've put auth in front of your server — api_key: ${MY_LOCAL_KEY} resolves from the environment at load time.

Free, no cloud account

Local mode is the open-source binary running against your own hardware: there's no BitRouter account, no metering, and nothing to pay. You only pay a provider when you route to a hosted one — and even then BitRouter takes no cut. If you'd rather not run the server yourself and want agents to pay hosted models per request, that's the managed provider path, which is where pricing applies.

Local & private models

Why route a local model through BitRouter

Two ways to register a local model

1. OpenAI-compatible endpoint via the config file

2. A registry-listed provider, auto-detected from the environment

Free, no cloud account

On this page