Getting Started
All available models on CallMissed — Indic STT/TTS/LLM, fast direct-routed models, and 300+ frontier text models. All accessible through one OpenAI-compatible API.
Overview
CallMissed provides access to a tiered model catalog through a single OpenAI-compatible API:
- Fast LLMs — Kimi K2.5 at 414 tokens/second on Nvidia B200 GPUs. The default and fastest LLM for voice agents.
- Indic Models — purpose-built for Indian languages. STT, TTS, and LLM optimized for Hindi, Tamil, Telugu, Bengali, and 19 more.
- Direct-Routed LLMs — sub-2s open-weights models (Kimi K2.5/K2.6, GPT-OSS, Gemma-4, GLM, Nemotron, Mistral Small).
- Frontier Models — 300+ models from every major creator (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more) through one endpoint.
All models use the same authentication and request format. Just change the model field.
Models API
List all available models programmatically. No authentication required.
# List all models
curl https://api.callmissed.com/api/v1/models
# Filter by category: llm, stt, tts
curl https://api.callmissed.com/api/v1/models?category=llm
# Filter free-plan models only
curl https://api.callmissed.com/api/v1/models?free=true
# Get a specific model
curl https://api.callmissed.com/api/v1/models/sarvam-30b
# Which models each plan tier can call
curl https://api.callmissed.com/api/v1/models/accessResponse includes: id, name, description, category, owned_by, context_window, context_length (alias of context_window for OpenAI-style clients), pricing, free, supports_streaming, supports_tools, supports_reasoning, and supports_vision.
The OpenAI-compatible listing at GET /v1/models (requires Authorization: Bearer cm_*) returns the same enriched fields, and the Anthropic-shape listing at GET /anthropic/v1/models surfaces them inside Anthropic's {data, has_more, first_id, last_id} envelope.
Free Plan Models
The free tier includes 17 models across four categories. Use GET /api/v1/models?free=true to list them, or see the Model Access by Plan page for the full breakdown.
LLM (8 models)
| Model ID | Description |
|---|---|
sarvam-30b | 30B MoE — Indic languages, cost-efficient |
sarvam-105b | 105B MoE — complex reasoning, Indic languages |
kimi-k2.5 | Moonshot K2.5 — 262K context, reasoning |
kimi-k2.6 | Moonshot K2.6 — improved reasoning + coding, 262K context |
glm-4.7-flash | GLM 4.7 Flash — fast inference |
gpt-oss-120b | GPT-OSS 120B — open-weights large model |
nemotron-3-super | Nvidia Nemotron 3 Super |
gemma-4-26b-a4b-it | Google Gemma 4 26B |
STT (3 models)
| Model ID | Description |
|---|---|
saaras:v3 | 23 langs (22 Indic + English), best for code-mixed |
whisper-large-v3-turbo | Whisper — 99 langs with auto-detect, transcribe + translate |
nova-3 | Nova 3 — 11 langs, diarization, smart-format, streaming-capable |
TTS (4 models)
| Model ID | Description |
|---|---|
bulbul:v3 | 39 voices, 11 Indian languages |
aura-2-en | Aura 2 — 39 English voices, low-latency streaming |
aura-2-es | Aura 2 — 10 Spanish voices, low-latency streaming |
melotts | MeloTTS — en + fr, cheapest TTS available |
Image (6 models)
| Model ID | Description |
|---|---|
flux-2-klein-9b | Flux 2 Klein — highest quality |
flux-2-dev | Flux 2 Dev — flagship fidelity |
lucid-origin | Lucid Origin — cinematic |
phoenix-1.0 | Phoenix — photorealistic |
sdxl-lightning | SDXL Lightning — fast |
dreamshaper-8-lcm | DreamShaper 8 LCM — fast |
nano-banana-2 | Google Gemini 3.1 Flash Image — multimodal, highest LM-Arena Elo (paid) |
nano-banana-pro | Google Gemini 3 Pro Image — flagship typography + fidelity (paid) |
All other models (e.g. kimi-k2.5-fast, openai/*, anthropic/*, google/*, x-ai/*, qwen/*, mistralai/*) require a paid plan (Starter, Pro, or Enterprise).
Pricing
All models are pay-per-use. Pricing is in USD.
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
kimi-k2.5-fast | $0.52 | $2.30 |
sarvam-30b | $0.35 (₹30) | $0.35 (₹30) |
sarvam-105b | $0.35 (₹30) | $0.35 (₹30) |
openai/gpt-5.4-mini | $1.00 | $6.00 |
openai/gpt-5.4 | $3.50 | $20.00 |
openai/gpt-5.4-pro | $40.00 | $240.00 |
anthropic/claude-sonnet-4.6 | $4.00 | $20.00 |
anthropic/claude-opus-4.6 | $7.00 | $35.00 |
google/gemini-3.1-pro-preview | $2.00 | $12.00 |
google/gemini-3-flash-preview | $0.50 | $3.00 |
google/gemini-3.1-flash-lite | $0.25 | $1.50 |
| STT Model | Price |
|---|---|
saaras:v3 | $0.53 / hour (₹45/hr) |
whisper-large-v3-turbo | $0.06 / hour |
nova-3 | $0.50 / hour |
| TTS Model | Price |
|---|---|
bulbul:v3 | $0.53 / 10K chars (₹45/10K) |
aura-2-en | $0.40 / 10K chars |
aura-2-es | $0.40 / 10K chars |
melotts | $0.05 / 10K chars |
Full pricing for all models is available via the API: GET /api/v1/models
import requests
# List all LLM models
models = requests.get("https://api.callmissed.com/api/v1/models?category=llm").json()
for m in models["data"]:
print(f"{m['id']} — {m['name']} ({m['context_window']} tokens) {'FREE' if m['free'] else 'PAID'}")Fast LLMs
High-throughput Kimi K2.5 inference tier optimized for voice-agent latency.
| Model ID | Status | Context | Best For |
|---|---|---|---|
kimi-k2.5-fast | Under maintenance — fall back to kimi-k2.5 | 262K | Voice agents, fast inference, reasoning tasks |
While kimi-k2.5-fast is in maintenance (returns HTTP 503), use kimi-k2.5:
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": "Hello"}]
)Indic Models
Speech to Text
| Model | Description | Languages |
|---|---|---|
saaras:v3 | Latest STT — best accuracy on Indian + code-mixed | 23 languages (22 Indic + English) |
For 99-language general-purpose transcription, see whisper-large-v3-turbo. For diarization + smart-format on calls, see nova-3. Both are free-tier and live under the audio model routes.
Text to Speech
| Model | Description | Voices |
|---|---|---|
bulbul:v3 | Natural TTS — 39 voices, 11 Indian languages | shubh (default) + 38 more |
For low-latency English / Spanish voice agents, see aura-2-en / aura-2-es. For ultra-cheap en/fr notification audio, see melotts. All three are free-tier.
Chat Completion (LLM)
| Model | Params | Context | Best For |
|---|---|---|---|
sarvam-30b | 30B MoE (2.4B active) | 64K tokens | Real-time chat, Indic languages, cost-efficient |
sarvam-105b | 105B MoE | 128K tokens | Complex reasoning, agentic tasks, long documents |
Both sarvam-30b and sarvam-105b support hybrid thinking mode via reasoning_effort: "low" | "medium" | "high". "none" and "minimal" are mapped down to "low" (verified 2026-05-01) so OpenAI-style clients sending reasoning_effort: "none" for thinking-off still get a 200. Full thinking-disable is available on the direct-routed kimi-k2.5 / kimi-k2.6 / gemma-4-26b-a4b-it models.
Audio Models
Free-tier on every plan. Source-of-truth pricing lives in backend/app/services/credit_service.py.
Speech to Text
| Model | Languages | Best for | Price |
|---|---|---|---|
whisper-large-v3-turbo | 99 with auto-detect | Multilingual general-purpose; transcribe + translate | $0.06 / hour |
nova-3 | 11 BCP-47 incl. multi auto-detect | Diarization, smart-format, streaming voice agents | $0.50 / hour |
Text to Speech
| Model | Languages | Voices | Price |
|---|---|---|---|
aura-2-en | English | 39 (luna default) | $0.40 / 10K chars |
aura-2-es | Spanish | 10 (aquila default) | $0.40 / 10K chars |
melotts | English + French | 1 per language | $0.05 / 10K chars |
[Inference] Aura 2 returns linear16 PCM streamed at 24 kHz; LiveKit's voice agent plays it without an MP3 decode in the hot path. MeloTTS returns base64 MP3 and is decoded by pyav. AI behavior is not guaranteed and may vary as upstreams update their schemas.
Direct-Routed LLMs
Low-latency models routed directly through CallMissed — sub-2s end-to-end on small prompts and free-tier eligible per the reasoning_effort matrix.
| Model ID | Creator | Context |
|---|---|---|
kimi-k2.5 | Moonshot AI | 262K |
kimi-k2.6 | Moonshot AI | 262K |
gpt-oss-120b | OpenAI (open-weights) | 128K |
gemma-4-26b-a4b-it | 128K | |
glm-4.7-flash | Zhipu | 128K |
nemotron-3-super | NVIDIA | 128K |
mistral-small-3.1 | Mistral | 128K |
Frontier Models
Access 300+ frontier models via the same /v1/chat/completions endpoint. Use the slash-prefixed model ID as the model field.
Popular Models
| Model ID | Creator | Context |
|---|---|---|
openai/gpt-5.4-pro | OpenAI | 1M |
openai/gpt-5.4-mini | OpenAI | 1M |
anthropic/claude-opus-4.6 | Anthropic | 1M |
anthropic/claude-sonnet-4.6 | Anthropic | 1M |
google/gemini-3.1-pro-preview | 1M | |
google/gemini-3-flash-preview | 1M | |
google/gemini-3.1-flash-lite | 1M | |
x-ai/grok-4.20 | xAI | 256K |
qwen/qwen3.5-plus | Qwen | 256K |
auto | Auto Router | — |
Use auto to let CallMissed select the best free model for your prompt automatically.
Model Selection
Pass the model ID in your request:
# Sarvam LLM
response = client.chat.completions.create(
model="sarvam-30b",
messages=[{"role": "user", "content": "Hello in Hindi"}]
)
# Indic LLM with thinking mode
response = client.chat.completions.create(
model="sarvam-105b",
messages=[{"role": "user", "content": "Solve this step by step"}],
extra_body={"reasoning_effort": "high"}
)
# Frontier model
response = client.chat.completions.create(
model="openai/gpt-5.4-mini",
messages=[{"role": "user", "content": "Hello"}]
)The API automatically routes to the correct backend based on the model ID:
- Bare names (
kimi-k2.5,kimi-k2.6,gpt-oss-120b,gemma-4-26b-a4b-it,mistral-small-3.1, …) → direct-routed sarvam-*prefix → Indic LLMs- Slash-prefixed (
openai/,anthropic/,google/, …) → frontier catalog