Skip to main content

Getting Started

MODELS

All available models on CallMissed — Indic STT/TTS/LLM, fast direct-routed models, and 300+ frontier text models. All accessible through one OpenAI-compatible API.

Overview

CallMissed provides access to a tiered model catalog through a single OpenAI-compatible API:

  • Fast LLMs — Kimi K2.5 at 414 tokens/second on Nvidia B200 GPUs. The default and fastest LLM for voice agents.
  • Indic Models — purpose-built for Indian languages. STT, TTS, and LLM optimized for Hindi, Tamil, Telugu, Bengali, and 19 more.
  • Direct-Routed LLMs — sub-2s open-weights models (Kimi K2.5/K2.6, GPT-OSS, Gemma-4, GLM, Nemotron, Mistral Small).
  • Frontier Models — 300+ models from every major creator (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, and more) through one endpoint.

All models use the same authentication and request format. Just change the model field.

Models API

List all available models programmatically. No authentication required.

bash
# List all models
curl https://api.callmissed.com/api/v1/models

# Filter by category: llm, stt, tts
curl https://api.callmissed.com/api/v1/models?category=llm

# Filter free-plan models only
curl https://api.callmissed.com/api/v1/models?free=true

# Get a specific model
curl https://api.callmissed.com/api/v1/models/sarvam-30b

# Which models each plan tier can call
curl https://api.callmissed.com/api/v1/models/access

Response includes: id, name, description, category, owned_by, context_window, context_length (alias of context_window for OpenAI-style clients), pricing, free, supports_streaming, supports_tools, supports_reasoning, and supports_vision.

The OpenAI-compatible listing at GET /v1/models (requires Authorization: Bearer cm_*) returns the same enriched fields, and the Anthropic-shape listing at GET /anthropic/v1/models surfaces them inside Anthropic's {data, has_more, first_id, last_id} envelope.

Free Plan Models

The free tier includes 17 models across four categories. Use GET /api/v1/models?free=true to list them, or see the Model Access by Plan page for the full breakdown.

LLM (8 models)

Model IDDescription
sarvam-30b30B MoE — Indic languages, cost-efficient
sarvam-105b105B MoE — complex reasoning, Indic languages
kimi-k2.5Moonshot K2.5 — 262K context, reasoning
kimi-k2.6Moonshot K2.6 — improved reasoning + coding, 262K context
glm-4.7-flashGLM 4.7 Flash — fast inference
gpt-oss-120bGPT-OSS 120B — open-weights large model
nemotron-3-superNvidia Nemotron 3 Super
gemma-4-26b-a4b-itGoogle Gemma 4 26B

STT (3 models)

Model IDDescription
saaras:v323 langs (22 Indic + English), best for code-mixed
whisper-large-v3-turboWhisper — 99 langs with auto-detect, transcribe + translate
nova-3Nova 3 — 11 langs, diarization, smart-format, streaming-capable

TTS (4 models)

Model IDDescription
bulbul:v339 voices, 11 Indian languages
aura-2-enAura 2 — 39 English voices, low-latency streaming
aura-2-esAura 2 — 10 Spanish voices, low-latency streaming
melottsMeloTTS — en + fr, cheapest TTS available

Image (6 models)

Model IDDescription
flux-2-klein-9bFlux 2 Klein — highest quality
flux-2-devFlux 2 Dev — flagship fidelity
lucid-originLucid Origin — cinematic
phoenix-1.0Phoenix — photorealistic
sdxl-lightningSDXL Lightning — fast
dreamshaper-8-lcmDreamShaper 8 LCM — fast
nano-banana-2Google Gemini 3.1 Flash Image — multimodal, highest LM-Arena Elo (paid)
nano-banana-proGoogle Gemini 3 Pro Image — flagship typography + fidelity (paid)

All other models (e.g. kimi-k2.5-fast, openai/*, anthropic/*, google/*, x-ai/*, qwen/*, mistralai/*) require a paid plan (Starter, Pro, or Enterprise).

Pricing

All models are pay-per-use. Pricing is in USD.

ModelInput / 1M tokensOutput / 1M tokens
kimi-k2.5-fast$0.52$2.30
sarvam-30b$0.35 (₹30)$0.35 (₹30)
sarvam-105b$0.35 (₹30)$0.35 (₹30)
openai/gpt-5.4-mini$1.00$6.00
openai/gpt-5.4$3.50$20.00
openai/gpt-5.4-pro$40.00$240.00
anthropic/claude-sonnet-4.6$4.00$20.00
anthropic/claude-opus-4.6$7.00$35.00
google/gemini-3.1-pro-preview$2.00$12.00
google/gemini-3-flash-preview$0.50$3.00
google/gemini-3.1-flash-lite$0.25$1.50
STT ModelPrice
saaras:v3$0.53 / hour (₹45/hr)
whisper-large-v3-turbo$0.06 / hour
nova-3$0.50 / hour
TTS ModelPrice
bulbul:v3$0.53 / 10K chars (₹45/10K)
aura-2-en$0.40 / 10K chars
aura-2-es$0.40 / 10K chars
melotts$0.05 / 10K chars

Full pricing for all models is available via the API: GET /api/v1/models

python
import requests

# List all LLM models
models = requests.get("https://api.callmissed.com/api/v1/models?category=llm").json()
for m in models["data"]:
    print(f"{m['id']} — {m['name']} ({m['context_window']} tokens) {'FREE' if m['free'] else 'PAID'}")

Fast LLMs

High-throughput Kimi K2.5 inference tier optimized for voice-agent latency.

Model IDStatusContextBest For
kimi-k2.5-fastUnder maintenance — fall back to kimi-k2.5262KVoice agents, fast inference, reasoning tasks

While kimi-k2.5-fast is in maintenance (returns HTTP 503), use kimi-k2.5:

python
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Hello"}]
)

Indic Models

Speech to Text

ModelDescriptionLanguages
saaras:v3Latest STT — best accuracy on Indian + code-mixed23 languages (22 Indic + English)

For 99-language general-purpose transcription, see whisper-large-v3-turbo. For diarization + smart-format on calls, see nova-3. Both are free-tier and live under the audio model routes.

Text to Speech

ModelDescriptionVoices
bulbul:v3Natural TTS — 39 voices, 11 Indian languagesshubh (default) + 38 more

For low-latency English / Spanish voice agents, see aura-2-en / aura-2-es. For ultra-cheap en/fr notification audio, see melotts. All three are free-tier.

Chat Completion (LLM)

ModelParamsContextBest For
sarvam-30b30B MoE (2.4B active)64K tokensReal-time chat, Indic languages, cost-efficient
sarvam-105b105B MoE128K tokensComplex reasoning, agentic tasks, long documents

Both sarvam-30b and sarvam-105b support hybrid thinking mode via reasoning_effort: "low" | "medium" | "high". "none" and "minimal" are mapped down to "low" (verified 2026-05-01) so OpenAI-style clients sending reasoning_effort: "none" for thinking-off still get a 200. Full thinking-disable is available on the direct-routed kimi-k2.5 / kimi-k2.6 / gemma-4-26b-a4b-it models.

Audio Models

Free-tier on every plan. Source-of-truth pricing lives in backend/app/services/credit_service.py.

Speech to Text

ModelLanguagesBest forPrice
whisper-large-v3-turbo99 with auto-detectMultilingual general-purpose; transcribe + translate$0.06 / hour
nova-311 BCP-47 incl. multi auto-detectDiarization, smart-format, streaming voice agents$0.50 / hour

Text to Speech

ModelLanguagesVoicesPrice
aura-2-enEnglish39 (luna default)$0.40 / 10K chars
aura-2-esSpanish10 (aquila default)$0.40 / 10K chars
melottsEnglish + French1 per language$0.05 / 10K chars

[Inference] Aura 2 returns linear16 PCM streamed at 24 kHz; LiveKit's voice agent plays it without an MP3 decode in the hot path. MeloTTS returns base64 MP3 and is decoded by pyav. AI behavior is not guaranteed and may vary as upstreams update their schemas.

Direct-Routed LLMs

Low-latency models routed directly through CallMissed — sub-2s end-to-end on small prompts and free-tier eligible per the reasoning_effort matrix.

Model IDCreatorContext
kimi-k2.5Moonshot AI262K
kimi-k2.6Moonshot AI262K
gpt-oss-120bOpenAI (open-weights)128K
gemma-4-26b-a4b-itGoogle128K
glm-4.7-flashZhipu128K
nemotron-3-superNVIDIA128K
mistral-small-3.1Mistral128K

Frontier Models

Access 300+ frontier models via the same /v1/chat/completions endpoint. Use the slash-prefixed model ID as the model field.

Model IDCreatorContext
openai/gpt-5.4-proOpenAI1M
openai/gpt-5.4-miniOpenAI1M
anthropic/claude-opus-4.6Anthropic1M
anthropic/claude-sonnet-4.6Anthropic1M
google/gemini-3.1-pro-previewGoogle1M
google/gemini-3-flash-previewGoogle1M
google/gemini-3.1-flash-liteGoogle1M
x-ai/grok-4.20xAI256K
qwen/qwen3.5-plusQwen256K
autoAuto Router

Use auto to let CallMissed select the best free model for your prompt automatically.

Model Selection

Pass the model ID in your request:

python
# Sarvam LLM
response = client.chat.completions.create(
    model="sarvam-30b",
    messages=[{"role": "user", "content": "Hello in Hindi"}]
)

# Indic LLM with thinking mode
response = client.chat.completions.create(
    model="sarvam-105b",
    messages=[{"role": "user", "content": "Solve this step by step"}],
    extra_body={"reasoning_effort": "high"}
)

# Frontier model
response = client.chat.completions.create(
    model="openai/gpt-5.4-mini",
    messages=[{"role": "user", "content": "Hello"}]
)

The API automatically routes to the correct backend based on the model ID:

  • Bare names (kimi-k2.5, kimi-k2.6, gpt-oss-120b, gemma-4-26b-a4b-it, mistral-small-3.1, …) → direct-routed
  • sarvam-* prefix → Indic LLMs
  • Slash-prefixed (openai/, anthropic/, google/, …) → frontier catalog
Was this page helpful?