Multi-Model Setup
A walkthrough of the common patterns for using multiple model providers: per-agent dispatch, cost tiering, local-first with hosted backup, API key rotation, and rate-limit handling.
Reference material for the provider system lives in:
- Model Providers → Overview: what providers are, configuration shape
- Model Providers → Routing: per-agent dispatch and OpenRouter
- Model Providers → Catalog: every provider’s config shape
When to use multi-model setup
Multi-model configuration is useful for:
- Cost tiering: cheap model handles high-volume channels; reasoning model handles complex requests
- Capability routing: vision-capable model for image-bearing channels, reasoning model for research workflows
- Local-first development: local Ollama for development, hosted endpoint for production
- Per-team isolation: different teams use different agents with different model_providers and credentials
- Rate-limit handling: rotate through API keys on
429(rate limit) responses
Core idea: per-agent dispatch
Each [agents.<alias>] entry points at exactly one [providers.models.<type>.<alias>]. If the model goes down, the agent goes down; the operator routes affected channels to a different agent. See Routing for the full pattern.
To run multiple models, run multiple agents, each binding to one model provider. Each channel binds to one agent at a time. To move a channel to a different agent, edit the channels list on the agent that should pick it up; Config::validate() makes sure references resolve at startup.
Cross-vendor reliability: use OpenRouter
OpenRouter is treated as a single first-class provider. It handles vendor fan-out and uptime behind one endpoint. If your goal is “one provider goes down, automatically use another”, that’s OpenRouter’s job, not ZeroClaw’s. The runtime sees one provider; OpenRouter does the cross-vendor work upstream.
Same-vendor retry
For transient errors (network blip, 503, timeout) against the same provider, ZeroClaw retries with exponential backoff, configurable globally under reliability (defaults: 2 retries, 500 ms initial backoff). These are inside-one-provider retries.
API key rotation
For providers that frequently encounter rate limits, supply additional API keys on the provider entry that ZeroClaw rotates through on 429 responses. The primary api_key is always tried first; extras are rotated on rate-limit errors. All keys must belong to the same provider account class; this is rate-limit smoothing, not multi-tenant key juggling.
Local development with hosted alternative
Run a local-Ollama agent and a hosted-provider agent side by side; route each channel to whichever you want it to use.
The dev agent runs from the CLI (no channel binding required, zeroclaw agent -a dev is enough). When Ollama is down, the dev agent fails fast and surfaces the error. The prod channels are unaffected.
Cost tiering: heavy model when needed, fast model otherwise
Run two agents and route channels to the appropriate tier. The delegate tool lets one agent hand off to another mid-conversation. Delegation is gated: the caller’s risk profile must set delegation_policy mode = "allow", and both agents must share the same risk profile (delegation does not cross trust tiers). So the frontline and heavy agents below run on the same trusted risk profile, they differ in model and runtime profile (iteration budget), not in trust surface.
The frontline agent handles every inbound message on Haiku. When it needs deeper reasoning, it calls the delegate tool with agent = "heavy"; because both agents share the trusted risk profile and that profile allows delegation, the heavier agent picks up the sub-task on Opus.
Error handling
Inside-one-provider retries trigger on:
- Timeout: provider did not respond within the configured timeout
- Connection error: network or DNS failure
- Rate limit (429): triggers API key rotation first; if all keys exhausted, fails up to the channel
- Service unavailable (503): temporary service issue
Retries are NOT triggered by:
- Invalid request (400): malformed input; retrying won’t help
- Permanent auth failure: invalid API key format
- Model output errors: the model responded but returned an error payload
When all retries are exhausted on a single provider, the failure surfaces to the calling channel. There is no automatic cross-provider retry, that’s the point of using OpenRouter or splitting traffic across multiple agents.
Debugging
Persisted logs ("rolling" is the default) capture retry and key-rotation behaviour. Then query traces:
sh
zeroclaw doctor traces --contains "retry"
zeroclaw doctor traces --contains "429"
zeroclaw doctor traces --contains "model_provider"
Best practices
- One agent per routing intent. If two channels need different model behavior, name two agents.
- Use OpenRouter for cross-vendor reliability. Cross-vendor “if Claude fails, try OpenAI” is OpenRouter’s job; configure it as one provider and let its endpoint handle the fan-out.
- Keep API key rotation pools homogeneous. All keys in
[reliability] api_keysshould be from the same provider account, this is rate-limit smoothing, not multi-tenancy. - Smoke-test each agent in isolation.
zeroclaw agent -a <alias>runs an agent without channel plumbing in the way. - Document agent intent. Add
# commentlines explaining which channels each agent serves and why. - Inject secrets via env, not inline.
ZEROCLAW_providers__models__<type>__<alias>__api_key=...setsapi_keyat startup; see Environment variables. - Separate dev and prod agents. Each environment gets its own
[agents.<alias>]entry bound to its own channels.
Credential resolution
Each provider entry resolves credentials in this order:
- Inline
api_keyon the provider entry. - Secrets store at
~/.zeroclaw/secrets. - Generic env override:
ZEROCLAW_providers__models__<type>__<alias>__api_key=...at startup. See Environment variables for the full grammar. - Per-vendor env var when the family supports it (e.g.
ANTHROPIC_API_KEY/ANTHROPIC_OAUTH_TOKENfor Anthropic;OPENROUTER_API_KEYfor OpenRouter).
Credentials are not shared between providers, set them per provider entry.