Multi-Model Setup
A walkthrough of the common patterns for using multiple model providers: per-agent dispatch, cost tiering, local-first with hosted backup, API key rotation, and rate-limit handling.
参考材料位于提供程序系统的以下位置:
- 模型提供商 → 概述 — 提供商的定义、配置结构
- Model Providers → Routing — per-agent dispatch and OpenRouter
- 模型提供商 → 目录 — 每个提供商的配置结构
When to use multi-model setup
多模型配置适用于:
- Cost tiering: cheap model handles high-volume channels; reasoning model handles complex requests
- Capability routing: vision-capable model for image-bearing channels, reasoning model for research workflows
- Local-first development: local Ollama for development, hosted endpoint for production
- Per-team isolation: different teams use different agents with different model_providers and credentials
- Rate-limit handling: rotate through API keys on
429(rate limit) responses
Core idea — per-agent dispatch
Each [agents.<alias>] entry points at exactly one [providers.models.<type>.<alias>]. If the model goes down, the agent goes down; the operator routes affected channels to a different agent. See Routing for the full pattern.
To run multiple models, run multiple agents:
[providers.models.anthropic.haiku]
model = "claude-haiku-4-5-20251001"
api_key = "sk-ant-..."
[providers.models.anthropic.sonnet]
model = "claude-sonnet-4-6"
api_key = "sk-ant-..."
[providers.models.deepseek.reasoner]
model = "deepseek-reasoner"
api_key = "sk-..."
[channels.telegram.home]
bot_token = "..."
[channels.slack.engineering]
bot_token = "..."
[channels.slack.research]
bot_token = "..."
[agents.fast]
model_provider = "anthropic.haiku"
risk_profile = "hardened"
runtime_profile = "tight" # fewer iterations for snappy public replies
channels = ["telegram.home"]
[agents.deep]
model_provider = "anthropic.sonnet"
risk_profile = "hardened"
runtime_profile = "deep" # higher iteration cap for engineering tasks
channels = ["slack.engineering"]
[agents.reasoner]
model_provider = "deepseek.reasoner"
risk_profile = "hardened"
runtime_profile = "deep" # extended chains for research-style prompts
channels = ["slack.research"]
# Shared `hardened` posture across the three public-facing agents,
# distinct `tight` / `deep` runtime profiles per per-agent throughput
# intent. `risk_profile` and `runtime_profile` are independent maps.
[risk_profiles.hardened]
level = "supervised"
workspace_only = true
require_approval_for_medium_risk = true
block_high_risk_commands = true
[runtime_profiles.tight]
max_tool_iterations = 5
max_actions_per_hour = 30
[runtime_profiles.deep]
max_tool_iterations = 50
max_actions_per_hour = 200
Each channel binds to one agent at a time. To move a channel to a different agent, edit the channels = [...] list on the agent that should pick it up — Config::validate() makes sure references resolve at startup.
Cross-vendor reliability — use OpenRouter
OpenRouter is treated as a single first-class provider. It handles vendor fan-out and uptime behind one endpoint:
[providers.models.openrouter.home]
model = "anthropic/claude-sonnet-4-20250514"
api_key = "sk-or-..."
[agents.assistant]
model_provider = "openrouter.home"
risk_profile = "hardened"
# runtime_profile omitted — uses runtime defaults
[risk_profiles.hardened]
level = "supervised"
If your goal is “one provider goes down, automatically use another”, that’s OpenRouter’s job — not ZeroClaw’s. The runtime sees one provider; OpenRouter does the cross-vendor work upstream.
Same-vendor retry
For transient errors (network blip, 503, timeout) against the same provider, ZeroClaw retries with exponential backoff. This is configurable globally:
[reliability]
provider_retries = 2 # retries per provider attempt before bailing
provider_backoff_ms = 500 # initial backoff; doubles per retry
Defaults are 2 retries, 500 ms initial backoff. These are inside-one-provider retries.
API key rotation
For providers that frequently encounter rate limits, supply additional API keys that ZeroClaw will rotate through on 429 responses:
[reliability]
api_keys = ["sk-key-2", "sk-key-3", "sk-key-4"]
The primary api_key (configured on the provider entry) is always tried first; these extras are rotated on rate-limit errors. All keys must belong to the same provider account class — this is rate-limit smoothing, not multi-tenant key juggling.
Local development with hosted alternative
Run a local-Ollama agent and a hosted-provider agent side by side; route each channel to whichever you want it to use.
[providers.models.ollama.local]
uri = "http://localhost:11434"
model = "qwen3.6:35b-a3b"
[providers.models.openrouter.home]
model = "anthropic/claude-haiku-4-5-20251001"
api_key = "sk-or-..."
[channels.telegram.production]
bot_token = "..."
[channels.slack.production]
bot_token = "..."
[agents.dev]
model_provider = "ollama.local"
risk_profile = "permissive" # local dev box — looser gates
runtime_profile = "deep" # plenty of iterations during iteration
[agents.prod]
model_provider = "openrouter.home"
risk_profile = "hardened" # public channels — strict gates
runtime_profile = "tight" # production discipline — short loops, low spend
channels = ["telegram.production", "slack.production"]
[risk_profiles.permissive]
level = "full"
workspace_only = false
[risk_profiles.hardened]
level = "supervised"
workspace_only = true
require_approval_for_medium_risk = true
block_high_risk_commands = true
[runtime_profiles.deep]
max_tool_iterations = 50
max_actions_per_hour = 200
[runtime_profiles.tight]
max_tool_iterations = 5
max_actions_per_hour = 30
The dev agent runs from the CLI (no channel binding required — zeroclaw agent -a dev is enough). When Ollama is down, the dev agent fails fast and surfaces the error. The prod channels are unaffected.
Cost tiering — heavy model when needed, fast model otherwise
Run two agents and route channels to the appropriate tier. The delegate tool lets one agent hand off to another mid-conversation. Each agent picks the risk profile that matches its trust surface: frontline faces public traffic, so it gets a stricter hardened profile; heavy is reached only via delegate from inside the trusted agent loop, so it can run on a looser permissive profile.
[providers.models.anthropic.opus]
model = "claude-opus-4-7"
api_key = "sk-ant-..."
# (no temperature — claude-opus-4-7 rejects any temperature setting)
[providers.models.anthropic.haiku]
model = "claude-haiku-4-5-20251001"
api_key = "sk-ant-..."
[channels.telegram.home]
bot_token = "..."
[agents.frontline]
model_provider = "anthropic.haiku"
risk_profile = "hardened" # public-facing strictness
runtime_profile = "tight" # low iteration cap, fast turn-around
channels = ["telegram.home"]
[agents.heavy]
model_provider = "anthropic.opus"
risk_profile = "permissive" # internal-delegate trust
runtime_profile = "deep" # high iteration cap for chain-of-thought work
# No channels — invoked via the delegate tool from frontline
# risk_profile and runtime_profile reference independent alias maps —
# the names above intentionally differ between the two profile kinds
# to make that clear.
[risk_profiles.hardened]
level = "supervised"
workspace_only = true
require_approval_for_medium_risk = true
block_high_risk_commands = true
[risk_profiles.permissive]
level = "full"
workspace_only = false
[runtime_profiles.tight]
max_tool_iterations = 5
max_actions_per_hour = 30
[runtime_profiles.deep]
max_tool_iterations = 50
max_actions_per_hour = 200
The frontline agent handles every inbound message on Haiku. When it needs deeper reasoning, it calls the delegate tool with agent = "heavy" and the heavier agent picks up the sub-task.
错误处理
Inside-one-provider retries trigger on:
- Timeout: provider did not respond within the configured timeout
- Connection error: network or DNS failure
- Rate limit (429): triggers API key rotation first; if all keys exhausted, fails up to the channel
- Service unavailable (503): temporary service issue
Retries are NOT triggered by:
- Invalid request (400): malformed input; retrying won’t help
- Permanent auth failure: invalid API key format
- Model output errors: the model responded but returned an error payload
When all retries are exhausted on a single provider, the failure surfaces to the calling channel. There is no automatic cross-provider retry — that’s the point of using OpenRouter or splitting traffic across multiple agents.
Debugging
Persisted logs ("rolling" is the default) capture retry and key-rotation behaviour:
[observability]
log_persistence = "rolling"
log_persistence_path = "state/runtime-trace.jsonl"
然后查询跟踪信息:
zeroclaw doctor traces --contains "retry"
zeroclaw doctor traces --contains "429"
zeroclaw doctor traces --contains "model_provider"
Best practices
- One agent per routing intent. If two channels need different model behavior, name two agents.
- Use OpenRouter for cross-vendor reliability. Cross-vendor “if Claude fails, try OpenAI” is OpenRouter’s job; configure it as one provider and let its endpoint handle the fan-out.
- Keep API key rotation pools homogeneous. All keys in
[reliability] api_keysshould be from the same provider account — this is rate-limit smoothing, not multi-tenancy. - Smoke-test each agent in isolation.
zeroclaw agent -a <alias>runs an agent without channel plumbing in the way. - Document agent intent. Add
# commentlines explaining which channels each agent serves and why. - Inject secrets via env, not inline.
ZEROCLAW_providers__models__<type>__<alias>__api_key=...setsapi_keyat startup; see Environment variables. - Separate dev and prod agents. Each environment gets its own
[agents.<alias>]entry bound to its own channels.
Credential resolution
Each provider entry resolves credentials in this order:
- Inline
api_keyon the provider entry. - Secrets store at
~/.zeroclaw/secrets. - Generic env override —
ZEROCLAW_providers__models__<type>__<alias>__api_key=...at startup. See Environment variables for the full grammar. - Per-vendor env var when the family supports it (e.g.
ANTHROPIC_API_KEY/ANTHROPIC_OAUTH_TOKENfor Anthropic;OPENROUTER_API_KEYfor OpenRouter).
Credentials are not shared between providers — set them per provider entry.