Skip to main content

Models

INDIC MODELS

Indic STT, TTS, and LLM models — optimized for Indian languages.

LLM

sarvam-30b

  • Architecture: 30B MoE, 2.4B active parameters, 128 sparse experts, GQA
  • Context: 64K tokens
  • Training: Pre-trained on 16T tokens
  • Best for: Real-time chat, Indic languages, cost-efficient reasoning
  • Thinking mode: reasoning_effort: "low" | "medium" | "high"

sarvam-105b

  • Architecture: 105B MoE, MLA architecture
  • Context: 128K tokens
  • Training: Pre-trained on 12T tokens
  • Best for: Complex reasoning, agentic tasks, long documents
  • Thinking mode: reasoning_effort: "low" | "medium" | "high"

Thinking Mode

Both sarvam-30b and sarvam-105b support hybrid thinking mode:

python
response = client.chat.completions.create(
    model="sarvam-105b",
    messages=[{"role": "user", "content": "Solve this complex problem step by step"}],
    extra_body={"reasoning_effort": "high"}
)
ValueDescription
"low"Minimal reasoning — fastest, cheapest
"medium"Balanced reasoning
"high"Deep reasoning — best quality, slower

The sarvam-* models reject "none" and "minimal"; the API maps both

of those values down to "low" so an OpenAI-style client that sends

reasoning_effort: "none" for thinking-off still works. Full

thinking-disable is available on the direct-routed kimi-k2.5 / kimi-k2.6 /

gemma-4-26b-a4b-it models — see the reasoning_effort matrix below.

Speech to Text

saaras:v3

  • Languages: 23 (22 Indic + English)
  • Output modes: transcribe, translate, verbatim, translit, codemix
  • Auto language detection: yes
  • Telephony support: 8kHz audio
  • Endpoint: POST /v1/audio/transcriptions

Supported languages include: Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu, Urdu, Assamese, Bodo, Dogri, Kashmiri, Konkani, Maithili, Manipuri, Nepali, Sanskrit, Santali, Sindhi, and English.

Text to Speech

bulbul:v3

  • Voices: 39 speakers
  • Languages: 11
  • Audio codecs: WAV, MP3, OPUS, FLAC, AAC, Mulaw, Alaw, PCM
  • Pace: 0.5–2.0 (maps to speed parameter)
  • Sample rates: 8000, 16000, 22050, 24000, 48000 Hz
  • Endpoint: POST /v1/audio/speech

Default voice: shubh. See the Voices page for the full list.

Was this page helpful?