Skip to content

Model Providers & Routing

Provider Format

Models are specified as provider:model_id. The bot automatically maps FAST_API_KEY / DEEP_API_KEY to the correct provider SDK env var at startup — you do not need to set OPENAI_API_KEY, ANTHROPIC_API_KEY, etc. yourself.

Provider prefix Example API key source
google: google:gemini-2.5-flash Google AI Studio
openai: openai:gpt-4o OpenAI Platform
anthropic: anthropic:claude-sonnet-4-20250514 Anthropic Console
groq: groq:llama-3.3-70b-versatile Groq Console
openrouter: openrouter:meta-llama/llama-3.3-70b-instruct OpenRouter
mistral: mistral:mistral-large-latest Mistral Console
xai: xai:grok-3 xAI Console
deepseek: deepseek:deepseek-chat DeepSeek Platform
ollama: ollama:llama3.2 Local (no key needed)
lmstudio: lmstudio:my-model Local (no key needed)
vllm: vllm:my-model Local (no key needed)
openai-chat: openai-chat:my-model Any OpenAI-compatible API

For the complete provider list, see the Agno model index.

Google Gemini / Gemma extras

google: models get additional features not available for other providers:

  • Search grounding — automatically searches Google when the model needs current information (GEMINI_SEARCH=true by default)
  • URL context — the model fetches and reads URLs mentioned in the conversation (GEMINI_URL_CONTEXT=false by default; not supported by gemma-* models)
  • Thinking budget — control reasoning token usage (GEMINI_THINKING_BUDGET, Gemini 2.5+ and Gemma 4)

Dual-model routing

Set DEEP_MODEL to enable a second model for complex messages:

FAST_MODEL=google:gemma-4-26b-a4b-it   # everyday questions
FAST_API_KEY=your_key

DEEP_MODEL=google:gemini-2.5-pro        # complex analysis
DEEP_API_KEY=your_key                   # can be same key if same provider

AUTO_ROUTE

With AUTO_ROUTE=on, the bot scores each message for complexity before routing:

  • Simple → fast model (acknowledgments, translations, short questions)
  • Complex → deep model (code, math, multi-part questions, reasoning)

The classifier (complexity_router.py) is multilingual — it handles EN, ZH, JA, KO, ES, FR, DE, RU, PT, VI, TH, and AR.

/deep command

Force the deep model for a single message regardless of AUTO_ROUTE:

/deep <your message>

Requires DEEP_MODEL to be set.

Auto-route on URL (Gemini-specific)

If the fast model is a Gemini without url_context enabled, but the deep model has it enabled, any message containing a URL is automatically routed to the deep model — so that the model with URL-reading capability handles it.

Error fallback

With FALLBACK_ON_ERROR=on, if one model returns an error, the bot retries with the other:

  • fast → deep: fast model fails → retry with deep model
  • deep → fast: deep model fails → retry with fast model

A [dango-sysinfo] note appears in the channel when fallback fires — the message format is:

> [dango-sysinfo] ⚡ google:gemini-2.0-flash failed — response served by google:gemini-2.5-pro.

Works best when fast and deep use different providers — same-provider fallback cannot protect against provider-wide outages. If both models share a provider, this warning is printed at startup:

⚠️  [config] FAST_MODEL and DEEP_MODEL share provider 'google' — FALLBACK_ON_ERROR won't protect against provider-wide outages.

Non-Gemini providers retry 2× before triggering fallback. Gemini handles its own retries internally.

Local models

Point FAST_BASE_URL at a local inference server:

FAST_MODEL=ollama:llama3.2
FAST_API_KEY=local           # most local servers accept any non-empty string
FAST_BASE_URL=http://localhost:11434

If the bot runs in Docker and the model server runs on the host:

FAST_BASE_URL=http://host.docker.internal:11434

Context token budget

CONTEXT_TOKEN_BUDGET caps the total input tokens per request. When history exceeds the limit, the oldest messages are dropped first to stay within budget.

CONTEXT_TOKEN_BUDGET=8192   # 0 = no limit (default)

Use FAST_CONTEXT_TOKEN_BUDGET / DEEP_CONTEXT_TOKEN_BUDGET to set different limits per model.