Model Providers & Routing¶
Provider Format¶
Models are specified as provider:model_id. The bot automatically maps FAST_API_KEY / DEEP_API_KEY to the correct provider SDK env var at startup — you do not need to set OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. yourself.
| Provider prefix | Example | API key source |
|---|---|---|
google: |
google:gemini-2.5-flash |
Google AI Studio |
openai: |
openai:gpt-4o |
OpenAI Platform |
anthropic: |
anthropic:claude-sonnet-4-20250514 |
Anthropic Console |
groq: |
groq:llama-3.3-70b-versatile |
Groq Console |
openrouter: |
openrouter:meta-llama/llama-3.3-70b-instruct |
OpenRouter |
mistral: |
mistral:mistral-large-latest |
Mistral Console |
xai: |
xai:grok-3 |
xAI Console |
deepseek: |
deepseek:deepseek-chat |
DeepSeek Platform |
ollama: |
ollama:llama3.2 |
Local (no key needed) |
lmstudio: |
lmstudio:my-model |
Local (no key needed) |
vllm: |
vllm:my-model |
Local (no key needed) |
openai-chat: |
openai-chat:my-model |
Any OpenAI-compatible API |
For the complete provider list, see the Agno model index.
Google Gemini / Gemma extras¶
google: models get additional features not available for other providers:
- Search grounding — automatically searches Google when the model needs current information (
GEMINI_SEARCH=trueby default) - URL context — the model fetches and reads URLs mentioned in the conversation (
GEMINI_URL_CONTEXT=falseby default; not supported bygemma-*models) - Thinking budget — control reasoning token usage (
GEMINI_THINKING_BUDGET, Gemini 2.5+ and Gemma 4)
Dual-model routing¶
Set DEEP_MODEL to enable a second model for complex messages:
FAST_MODEL=google:gemma-4-26b-a4b-it # everyday questions
FAST_API_KEY=your_key
DEEP_MODEL=google:gemini-2.5-pro # complex analysis
DEEP_API_KEY=your_key # can be same key if same provider
AUTO_ROUTE¶
With AUTO_ROUTE=on, the bot scores each message for complexity before routing:
- Simple → fast model (acknowledgments, translations, short questions)
- Complex → deep model (code, math, multi-part questions, reasoning)
The classifier (complexity_router.py) is multilingual — it handles EN, ZH, JA, KO, ES, FR, DE, RU, PT, VI, TH, and AR.
/deep command¶
Force the deep model for a single message regardless of AUTO_ROUTE:
Requires DEEP_MODEL to be set.
Auto-route on URL (Gemini-specific)¶
If the fast model is a Gemini without url_context enabled, but the deep model has it enabled, any message containing a URL is automatically routed to the deep model — so that the model with URL-reading capability handles it.
Error fallback¶
With FALLBACK_ON_ERROR=on, if one model returns an error, the bot retries with the other:
fast → deep: fast model fails → retry with deep modeldeep → fast: deep model fails → retry with fast model
A [dango-sysinfo] note appears in the channel when fallback fires — the message format is:
Works best when fast and deep use different providers — same-provider fallback cannot protect against provider-wide outages. If both models share a provider, this warning is printed at startup:
⚠️ [config] FAST_MODEL and DEEP_MODEL share provider 'google' — FALLBACK_ON_ERROR won't protect against provider-wide outages.
Non-Gemini providers retry 2× before triggering fallback. Gemini handles its own retries internally.
Local models¶
Point FAST_BASE_URL at a local inference server:
FAST_MODEL=ollama:llama3.2
FAST_API_KEY=local # most local servers accept any non-empty string
FAST_BASE_URL=http://localhost:11434
If the bot runs in Docker and the model server runs on the host:
Context token budget¶
CONTEXT_TOKEN_BUDGET caps the total input tokens per request. When history exceeds the limit, the oldest messages are dropped first to stay within budget.
Use FAST_CONTEXT_TOKEN_BUDGET / DEEP_CONTEXT_TOKEN_BUDGET to set different limits per model.