自定义提供程序

三种添加 ZeroClaw 未内置的提供者的方法：

Use the custom slot. For any OpenAI-compatible endpoint not covered by an existing canonical slot.
Use the first-class local-server slots (lmstudio, llamacpp, sglang, vllm, osaurus, litellm). Thin wrappers with sensible defaults.
Implement the ModelProvider trait in Rust. For anything that’s not OpenAI-compatible.

OpenAI-compatible endpoint — use the `custom` slot

如果该服务支持 OpenAI 的 chat-completions，则只需进行配置更改：

[providers.models.custom.gateway]
uri     = "https://my-gateway.example.com/v1"
model   = "my-model-id"
api_key = "..."                          # omit if the endpoint needs no auth

The custom slot requires uri (the family’s endpoint enum has no default). Reference it from an agent:

[agents.assistant]
model_provider = "custom.gateway"
risk_profile   = "hardened"

This is the same OpenAiCompatibleModelProvider runtime impl used by groq, mistral, xai, and every other vendor with its own canonical slot in the catalog. The difference is which family slot you use — custom is the catch-all for endpoints not represented by a vendor slot.

一等公民的本地推理服务器

ZeroClaw ships canonical slots for popular local-inference stacks. They’re all OpenAI-compatible under the hood but with default uri values pre-applied so you can usually omit uri entirely.

llama.cpp — slot `llamacpp`

llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 133000 --host 127.0.0.1 --port 8033

[providers.models.llamacpp.local]
uri   = "http://127.0.0.1:8033/v1"       # omit to use the family default http://localhost:8080/v1
model = "ggml-org/gpt-oss-20b-GGUF"
# api_key only required if llama-server was started with --api-key

Optional fields (apply to any compat-slot family, including llamacpp):

字段	类型	默认	描述
`think`	`bool`	—	Sets `enable_thinking` at the top level of the request body. `false` signals thinking-capable models to skip chain-of-thought.
`chat_template_kwargs`	table	—	Passed verbatim as `chat_template_kwargs` to the Jinja chat template. Use for model-family-specific template variables.
`max_tokens`	`u32`	—	Maximum output tokens per response.
`timeout_secs`	`u64`	120	Request timeout for non-streaming calls.

Controlling thinking mode varies by model family. think = false sets the top-level enable_thinking field in the request. Some models (e.g. Qwen3) read this flag from the Jinja template via chat_template_kwargs instead:

[providers.models.llamacpp.qwen3]
uri = "http://127.0.0.1:8033/v1"
model = "Qwen/Qwen3-30B-A3B-GGUF"
think = false
# Qwen3 reads enable_thinking from the Jinja template, not the top-level field:
chat_template_kwargs = { enable_thinking = false }

Other model families use different template variable names — check your model’s chat template and set the appropriate key under chat_template_kwargs.

SGLang — slot `sglang`

python -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --port 30000

[providers.models.sglang.local]
uri   = "http://localhost:30000/v1"      # family default
model = "meta-llama/Llama-3.1-8B-Instruct"

vLLM — slot `vllm`

vllm serve meta-llama/Llama-3.1-8B-Instruct

[providers.models.vllm.local]
uri   = "http://localhost:8000/v1"       # family default
model = "meta-llama/Llama-3.1-8B-Instruct"

LM Studio, Osaurus, LiteLLM

Slots lmstudio, osaurus, litellm follow the same pattern — see the catalog.

验证

Regardless of approach:

zeroclaw config list                          # loads config; any validation failures print to stderr
zeroclaw models refresh --provider <type>.<alias>   # list models the endpoint advertises
zeroclaw agent -a <alias> -m "hello"          # smoke-test against the agent at `[agents.<alias>]`

Implementing a new `ModelProvider` trait

If the endpoint isn’t OpenAI-compatible and isn’t one of the local-server slots, you need code.

The trait lives in crates/zeroclaw-api/src/model_provider.rs:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ModelProvider: Send + Sync {
    fn name(&self) -> &str;
    fn supports_streaming(&self) -> bool { true }
    fn supports_streaming_tool_events(&self) -> bool { false }

    async fn chat(
        &self,
        messages: Vec<Message>,
        tools: Vec<ToolSchema>,
        options: ChatOptions,
    ) -> Pin<Box<dyn Stream<Item = Result<StreamEvent>> + Send>>;
}
}

实现模式：

Define the typed config in crates/zeroclaw-config/src/schema.rs:

#![allow(unused)]
fn main() {
pub struct MyProviderModelProviderConfig {
    #[serde(flatten)]
    pub base: ModelProviderConfig,
    pub endpoint: MyProviderEndpoint,
    // family-specific fields
}

pub enum MyProviderEndpoint { Default }
impl ModelEndpoint for MyProviderEndpoint {
    fn uri(&self) -> &'static str {
        match self { Self::Default => "https://my-provider.example.com/v1" }
    }
}
}

Add the slot to for_each_model_provider_slot! in crates/zeroclaw-config/src/providers.rs. Every helper picks up the new slot automatically.
Add the runtime impl in crates/zeroclaw-providers/src/myprovider.rs. Translate Vec<Message> to the wire format, stream the response, emit StreamEvent values.
Wire the factory branch in crates/zeroclaw-providers/src/lib.rs::create_provider_with_url_and_options.
Add a feature flag in Cargo.toml if the provider pulls heavy deps.

请参阅 anthropic.rs，了解具有完全自定义 wire 格式的提供程序示例。请参阅 compatible.rs，了解 SSE 流式传输的 OpenAI 兼容模式。

故障排除

身份验证错误

Verify the API key matches the endpoint (many vendors use key prefixes — sk-, gsk_, sk-ant-).
Check that uri includes the scheme (http:// / https://) and the /v1 path if the endpoint expects it.
Endpoints behind a VPN or proxy? Confirm routing from the ZeroClaw host.

未找到模型

列出该端点所支持的功能：

curl -sS "$URI/models" -H `Authorization: Bearer $API_KEY` | jq

If the endpoint doesn’t implement /models, send a direct chat request and read the error — most endpoints return the expected model family in the error body.
Gateway services often expose only a subset of upstream models.

连接问题

curl -I $URI — does it respond?
Firewall, proxy, egress rules? VPS providers sometimes block outbound high ports.
Vendor status page if it’s a hosted service.

另见

Overview — provider model and how per-agent dispatch works
Configuration — full [providers.*] schema, Azure typed config, regional and OAuth variants
Catalog — every canonical slot with a worked TOML example
开发 → 插件协议 — 如果插件比原生 crate 更合适

Keyboard shortcuts

ZeroClaw Docs