Models — CallMissed Docs | CallMissed Docs

Getting Started

MODELS

All available models on CallMissed — Indic STT/TTS/LLM, fast direct-routed models, and 300+ frontier text models. All accessible through one OpenAI-compatible API.

Overview

CallMissed provides access to a tiered model catalog through a single OpenAI-compatible API:

Fast LLMs — Kimi K2.5 at 414 tokens/second on Nvidia B200 GPUs. The default and fastest LLM for voice agents.
Indic Models — purpose-built for Indian languages. STT, TTS, and LLM optimized for Hindi, Tamil, Telugu, Bengali, and 19 more.
Direct-Routed LLMs — sub-2s open-weights models (Kimi K2.5/K2.6, GPT-OSS, Gemma-4, GLM, Nemotron, Mistral Small).
Frontier Models — 300+ models from every major creator (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more) through one endpoint.

All models use the same authentication and request format. Just change the model field.

Models API

List all available models programmatically. No authentication required.

bash

# List all models
curl https://api.callmissed.com/api/v1/models

# Filter by category: llm, stt, tts
curl https://api.callmissed.com/api/v1/models?category=llm

# Filter free-plan models only
curl https://api.callmissed.com/api/v1/models?free=true

# Get a specific model
curl https://api.callmissed.com/api/v1/models/sarvam-30b

# Which models each plan tier can call
curl https://api.callmissed.com/api/v1/models/access

Response includes: id, name, description, category, owned_by, context_window, context_length (alias of context_window for OpenAI-style clients), pricing, free, supports_streaming, supports_tools, supports_reasoning, and supports_vision.

The OpenAI-compatible listing at GET /v1/models (requires Authorization: Bearer cm_*) returns the same enriched fields, and the Anthropic-shape listing at GET /anthropic/v1/models surfaces them inside Anthropic's {data, has_more, first_id, last_id} envelope.

Free Plan Models

The free tier includes 17 models across four categories. Use GET /api/v1/models?free=true to list them, or see the Model Access by Plan page for the full breakdown.

LLM (8 models)

Model ID	Description
`sarvam-30b`	30B MoE — Indic languages, cost-efficient
`sarvam-105b`	105B MoE — complex reasoning, Indic languages
`kimi-k2.5`	Moonshot K2.5 — 262K context, reasoning
`kimi-k2.6`	Moonshot K2.6 — improved reasoning + coding, 262K context
`glm-4.7-flash`	GLM 4.7 Flash — fast inference
`gpt-oss-120b`	GPT-OSS 120B — open-weights large model
`nemotron-3-super`	Nvidia Nemotron 3 Super
`gemma-4-26b-a4b-it`	Google Gemma 4 26B

STT (3 models)

Model ID	Description
`saaras:v3`	23 langs (22 Indic + English), best for code-mixed
`whisper-large-v3-turbo`	Whisper — 99 langs with auto-detect, transcribe + translate
`nova-3`	Nova 3 — 11 langs, diarization, smart-format, streaming-capable

TTS (4 models)

Model ID	Description
`bulbul:v3`	39 voices, 11 Indian languages
`aura-2-en`	Aura 2 — 39 English voices, low-latency streaming
`aura-2-es`	Aura 2 — 10 Spanish voices, low-latency streaming
`melotts`	MeloTTS — en + fr, cheapest TTS available

Image (6 models)

Model ID	Description
`flux-2-klein-9b`	Flux 2 Klein — highest quality
`flux-2-dev`	Flux 2 Dev — flagship fidelity
`lucid-origin`	Lucid Origin — cinematic
`phoenix-1.0`	Phoenix — photorealistic
`sdxl-lightning`	SDXL Lightning — fast
`dreamshaper-8-lcm`	DreamShaper 8 LCM — fast
`nano-banana-2`	Google Gemini 3.1 Flash Image — multimodal, highest LM-Arena Elo (paid)
`nano-banana-pro`	Google Gemini 3 Pro Image — flagship typography + fidelity (paid)

All other models (e.g. kimi-k2.5-fast, openai/*, anthropic/*, google/*, x-ai/*, qwen/*, mistralai/*) require a paid plan (Starter, Pro, or Enterprise).

Pricing

All models are pay-per-use. Pricing is in USD.

Model	Input / 1M tokens	Output / 1M tokens
`kimi-k2.5-fast`	$0.52	$2.30
`sarvam-30b`	$0.35 (₹30)	$0.35 (₹30)
`sarvam-105b`	$0.35 (₹30)	$0.35 (₹30)
`openai/gpt-5.4-mini`	$1.00	$6.00
`openai/gpt-5.4`	$3.50	$20.00
`openai/gpt-5.4-pro`	$40.00	$240.00
`anthropic/claude-sonnet-4.6`	$4.00	$20.00
`anthropic/claude-opus-4.6`	$7.00	$35.00
`google/gemini-3.1-pro-preview`	$2.00	$12.00
`google/gemini-3-flash-preview`	$0.50	$3.00
`google/gemini-3.1-flash-lite`	$0.25	$1.50

STT Model	Price
`saaras:v3`	$0.53 / hour (₹45/hr)
`whisper-large-v3-turbo`	$0.06 / hour
`nova-3`	$0.50 / hour

TTS Model	Price
`bulbul:v3`	$0.53 / 10K chars (₹45/10K)
`aura-2-en`	$0.40 / 10K chars
`aura-2-es`	$0.40 / 10K chars
`melotts`	$0.05 / 10K chars

Full pricing for all models is available via the API: GET /api/v1/models

python

import requests

# List all LLM models
models = requests.get("https://api.callmissed.com/api/v1/models?category=llm").json()
for m in models["data"]:
    print(f"{m['id']} — {m['name']} ({m['context_window']} tokens) {'FREE' if m['free'] else 'PAID'}")

Fast LLMs

High-throughput Kimi K2.5 inference tier optimized for voice-agent latency.

Model ID	Status	Context	Best For
`kimi-k2.5-fast`	Under maintenance — fall back to `kimi-k2.5`	262K	Voice agents, fast inference, reasoning tasks

While kimi-k2.5-fast is in maintenance (returns HTTP 503), use kimi-k2.5:

python

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Hello"}]
)

Indic Models

Speech to Text

Model	Description	Languages
`saaras:v3`	Latest STT — best accuracy on Indian + code-mixed	23 languages (22 Indic + English)

For 99-language general-purpose transcription, see whisper-large-v3-turbo. For diarization + smart-format on calls, see nova-3. Both are free-tier and live under the audio model routes.

Text to Speech

Model	Description	Voices
`bulbul:v3`	Natural TTS — 39 voices, 11 Indian languages	shubh (default) + 38 more

For low-latency English / Spanish voice agents, see aura-2-en / aura-2-es. For ultra-cheap en/fr notification audio, see melotts. All three are free-tier.

Chat Completion (LLM)

Model	Params	Context	Best For
`sarvam-30b`	30B MoE (2.4B active)	64K tokens	Real-time chat, Indic languages, cost-efficient
`sarvam-105b`	105B MoE	128K tokens	Complex reasoning, agentic tasks, long documents

Both sarvam-30b and sarvam-105b support hybrid thinking mode via reasoning_effort: "low" | "medium" | "high". "none" and "minimal" are mapped down to "low" (verified 2026-05-01) so OpenAI-style clients sending reasoning_effort: "none" for thinking-off still get a 200. Full thinking-disable is available on the direct-routed kimi-k2.5 / kimi-k2.6 / gemma-4-26b-a4b-it models.

Audio Models

Free-tier on every plan. Source-of-truth pricing lives in backend/app/services/credit_service.py.

Speech to Text

Model	Languages	Best for	Price
`whisper-large-v3-turbo`	99 with auto-detect	Multilingual general-purpose; transcribe + translate	$0.06 / hour
`nova-3`	11 BCP-47 incl. `multi` auto-detect	Diarization, smart-format, streaming voice agents	$0.50 / hour

Text to Speech

Model	Languages	Voices	Price
`aura-2-en`	English	39 (luna default)	$0.40 / 10K chars
`aura-2-es`	Spanish	10 (aquila default)	$0.40 / 10K chars
`melotts`	English + French	1 per language	$0.05 / 10K chars

[Inference] Aura 2 returns linear16 PCM streamed at 24 kHz; LiveKit's voice agent plays it without an MP3 decode in the hot path. MeloTTS returns base64 MP3 and is decoded by pyav. AI behavior is not guaranteed and may vary as upstreams update their schemas.

Direct-Routed LLMs

Low-latency models routed directly through CallMissed — sub-2s end-to-end on small prompts and free-tier eligible per the reasoning_effort matrix.

Model ID	Creator	Context
`kimi-k2.5`	Moonshot AI	262K
`kimi-k2.6`	Moonshot AI	262K
`gpt-oss-120b`	OpenAI (open-weights)	128K
`gemma-4-26b-a4b-it`	Google	128K
`glm-4.7-flash`	Zhipu	128K
`nemotron-3-super`	NVIDIA	128K
`mistral-small-3.1`	Mistral	128K

Frontier Models

Access 300+ frontier models via the same /v1/chat/completions endpoint. Use the slash-prefixed model ID as the model field.

Popular Models

Model ID	Creator	Context
`openai/gpt-5.4-pro`	OpenAI	1M
`openai/gpt-5.4-mini`	OpenAI	1M
`anthropic/claude-opus-4.6`	Anthropic	1M
`anthropic/claude-sonnet-4.6`	Anthropic	1M
`google/gemini-3.1-pro-preview`	Google	1M
`google/gemini-3-flash-preview`	Google	1M
`google/gemini-3.1-flash-lite`	Google	1M
`x-ai/grok-4.20`	xAI	256K
`qwen/qwen3.5-plus`	Qwen	256K
`auto`	Auto Router	—

Use auto to let CallMissed select the best free model for your prompt automatically.

Model Selection

Pass the model ID in your request:

python

# Sarvam LLM
response = client.chat.completions.create(
    model="sarvam-30b",
    messages=[{"role": "user", "content": "Hello in Hindi"}]
)

# Indic LLM with thinking mode
response = client.chat.completions.create(
    model="sarvam-105b",
    messages=[{"role": "user", "content": "Solve this step by step"}],
    extra_body={"reasoning_effort": "high"}
)

# Frontier model
response = client.chat.completions.create(
    model="openai/gpt-5.4-mini",
    messages=[{"role": "user", "content": "Hello"}]
)

The API automatically routes to the correct backend based on the model ID:

Bare names (kimi-k2.5, kimi-k2.6, gpt-oss-120b, gemma-4-26b-a4b-it, mistral-small-3.1, …) → direct-routed
sarvam-* prefix → Indic LLMs
Slash-prefixed (openai/, anthropic/, google/, …) → frontier catalog

SDKs & Libraries

Model Access by Plan

Was this page helpful?