Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

自定义提供程序

三种添加 ZeroClaw 未内置的提供者的方法:

  1. Use the custom slot. For any OpenAI-compatible endpoint not covered by an existing canonical slot.
  2. Use the first-class local-server slots (lmstudio, llamacpp, sglang, vllm, osaurus, litellm). Thin wrappers with sensible defaults.
  3. Implement the ModelProvider trait in Rust. For anything that’s not OpenAI-compatible.

OpenAI-compatible endpoint — use the custom slot

如果该服务支持 OpenAI 的 chat-completions,则只需进行配置更改:

[providers.models.custom.gateway]
uri     = "https://my-gateway.example.com/v1"
model   = "my-model-id"
api_key = "..."                          # omit if the endpoint needs no auth

The custom slot requires uri (the family’s endpoint enum has no default). Reference it from an agent:

[agents.assistant]
model_provider = "custom.gateway"
risk_profile   = "hardened"

This is the same OpenAiCompatibleModelProvider runtime impl used by groq, mistral, xai, and every other vendor with its own canonical slot in the catalog. The difference is which family slot you use — custom is the catch-all for endpoints not represented by a vendor slot.

一等公民的本地推理服务器

ZeroClaw ships canonical slots for popular local-inference stacks. They’re all OpenAI-compatible under the hood but with default uri values pre-applied so you can usually omit uri entirely.

llama.cpp — slot llamacpp

llama-server -hf ggml-org/gpt-oss-20b-GGUF --jinja -c 133000 --host 127.0.0.1 --port 8033
[providers.models.llamacpp.local]
uri   = "http://127.0.0.1:8033/v1"       # omit to use the family default http://localhost:8080/v1
model = "ggml-org/gpt-oss-20b-GGUF"
# api_key only required if llama-server was started with --api-key

Optional fields (apply to any compat-slot family, including llamacpp):

字段类型默认描述
thinkboolSets enable_thinking at the top level of the request body. false signals thinking-capable models to skip chain-of-thought.
chat_template_kwargstablePassed verbatim as chat_template_kwargs to the Jinja chat template. Use for model-family-specific template variables.
max_tokensu32Maximum output tokens per response.
timeout_secsu64120Request timeout for non-streaming calls.

Controlling thinking mode varies by model family. think = false sets the top-level enable_thinking field in the request. Some models (e.g. Qwen3) read this flag from the Jinja template via chat_template_kwargs instead:

[providers.models.llamacpp.qwen3]
uri = "http://127.0.0.1:8033/v1"
model = "Qwen/Qwen3-30B-A3B-GGUF"
think = false
# Qwen3 reads enable_thinking from the Jinja template, not the top-level field:
chat_template_kwargs = { enable_thinking = false }

Other model families use different template variable names — check your model’s chat template and set the appropriate key under chat_template_kwargs.

SGLang — slot sglang

python -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --port 30000
[providers.models.sglang.local]
uri   = "http://localhost:30000/v1"      # family default
model = "meta-llama/Llama-3.1-8B-Instruct"

vLLM — slot vllm

vllm serve meta-llama/Llama-3.1-8B-Instruct
[providers.models.vllm.local]
uri   = "http://localhost:8000/v1"       # family default
model = "meta-llama/Llama-3.1-8B-Instruct"

LM Studio, Osaurus, LiteLLM

Slots lmstudio, osaurus, litellm follow the same pattern — see the catalog.

验证

Regardless of approach:

zeroclaw config list                          # loads config; any validation failures print to stderr
zeroclaw models refresh --provider <type>.<alias>   # list models the endpoint advertises
zeroclaw agent -a <alias> -m "hello"          # smoke-test against the agent at `[agents.<alias>]`

Implementing a new ModelProvider trait

If the endpoint isn’t OpenAI-compatible and isn’t one of the local-server slots, you need code.

The trait lives in crates/zeroclaw-api/src/model_provider.rs:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait ModelProvider: Send + Sync {
    fn name(&self) -> &str;
    fn supports_streaming(&self) -> bool { true }
    fn supports_streaming_tool_events(&self) -> bool { false }

    async fn chat(
        &self,
        messages: Vec<Message>,
        tools: Vec<ToolSchema>,
        options: ChatOptions,
    ) -> Pin<Box<dyn Stream<Item = Result<StreamEvent>> + Send>>;
}
}

实现模式:

  1. Define the typed config in crates/zeroclaw-config/src/schema.rs:

    #![allow(unused)]
    fn main() {
    pub struct MyProviderModelProviderConfig {
        #[serde(flatten)]
        pub base: ModelProviderConfig,
        pub endpoint: MyProviderEndpoint,
        // family-specific fields
    }
    
    pub enum MyProviderEndpoint { Default }
    impl ModelEndpoint for MyProviderEndpoint {
        fn uri(&self) -> &'static str {
            match self { Self::Default => "https://my-provider.example.com/v1" }
        }
    }
    }
  2. Add the slot to for_each_model_provider_slot! in crates/zeroclaw-config/src/providers.rs. Every helper picks up the new slot automatically.

  3. Add the runtime impl in crates/zeroclaw-providers/src/myprovider.rs. Translate Vec<Message> to the wire format, stream the response, emit StreamEvent values.

  4. Wire the factory branch in crates/zeroclaw-providers/src/lib.rs::create_provider_with_url_and_options.

  5. Add a feature flag in Cargo.toml if the provider pulls heavy deps.

请参阅 anthropic.rs,了解具有完全自定义 wire 格式的提供程序示例。请参阅 compatible.rs,了解 SSE 流式传输的 OpenAI 兼容模式。

故障排除

身份验证错误

  • Verify the API key matches the endpoint (many vendors use key prefixes — sk-, gsk_, sk-ant-).
  • Check that uri includes the scheme (http:// / https://) and the /v1 path if the endpoint expects it.
  • Endpoints behind a VPN or proxy? Confirm routing from the ZeroClaw host.

未找到模型

  • 列出该端点所支持的功能:

    curl -sS "$URI/models" -H `Authorization: Bearer $API_KEY` | jq
    
  • If the endpoint doesn’t implement /models, send a direct chat request and read the error — most endpoints return the expected model family in the error body.

  • Gateway services often expose only a subset of upstream models.

连接问题

  • curl -I $URI — does it respond?
  • Firewall, proxy, egress rules? VPS providers sometimes block outbound high ports.
  • Vendor status page if it’s a hosted service.

另见

  • Overview — provider model and how per-agent dispatch works
  • Configuration — full [providers.*] schema, Azure typed config, regional and OAuth variants
  • Catalog — every canonical slot with a worked TOML example
  • 开发 → 插件协议 — 如果插件比原生 crate 更合适