API Guides & Tutorials
Chat Completion
Generate text responses using our OpenAI-compatible chat completion API.
Overview
The Chat Completion API generates AI responses given a list of messages. It's fully OpenAI-compatible — use the same SDK and request format.
Endpoint: POST /v1/chat/completions
How a request flows
Every chat completion takes the same path through the platform — your app never talks to the underlying provider directly:
Your app
Send POST /v1/chat/completions with model + messages
CallMissed gateway
Authenticate the cm_ key, check credits, route by model id
Provider
Run inference on the best-fit backend — picked from the model id
CallMissed gateway
Stream tokens back and deduct credits when the response completes
Your app
Receive the completion (all at once, or token-by-token when streaming)
Make your first request
Get an API key
Create a key in the dashboard (Profile → API Keys). It looks like cm_xxxx… and is shown once.
Point your SDK at CallMissed
Set the base URL to https://api.callmissed.com/v1 and pass your cm_ key. No other change to your OpenAI code.
Send messages and read the reply
Call chat.completions.create with a model and a messages array. Read response.choices[0].message.content.
Send a single-turn or multi-turn conversation and receive a complete response. Use any OpenAI SDK — set base_url to https://api.callmissed.com/v1 and api_key to your cm_ key.
Basic Usage
from openai import OpenAI
client = OpenAI(
api_key="cm_your_key",
base_url="https://api.callmissed.com/v1"
)
response = client.chat.completions.create(
model="sarvam-30b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of India?"}
]
)
print(response.choices[0].message.content)Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | Model ID (e.g. sarvam-30b, openai/gpt-5.4-mini) |
messages | array | List of {role, content} objects. System prompt goes here as {"role": "system", "content": "..."} |
stream | boolean | Enable streaming SSE responses |
temperature | number | Sampling temperature (0–2) |
max_tokens | integer | Maximum tokens to generate |
n | integer | Number of completions to generate (default 1) |
top_p | float | Nucleus sampling (0–1) |
top_k | integer | Top-K sampling |
frequency_penalty | float | Penalize repeated tokens (−2 to 2) |
presence_penalty | float | Penalize new topics (−2 to 2) |
repetition_penalty | float | Reduce repetition (0–2) |
seed | integer | Deterministic sampling |
stop | array | Stop sequences |
logit_bias | object | Token probability adjustments |
logprobs | boolean | Return log probabilities |
top_logprobs | integer | Top N log probs per token |
tools | array | Tool/function definitions for function calling |
parallel_tool_calls | boolean | Allow parallel function calls |
response_format | object | {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} |
structured_outputs | boolean | Enforce strict JSON schema |
stream_options | object | {"include_usage": true} to get token counts in stream |
reasoning_effort | string | "none" / "minimal" / "low" / "medium" / "high" — see the per-model matrix below |
Frontier Parameters
When using slash-prefixed frontier models, these additional parameters are supported:
| Parameter | Type | Description |
|---|---|---|
provider | object | Provider routing preferences (sort, order, only, ignore, max_price) |
models | array | Fallback model list |
route | string | "fallback" |
plugins | array | [{"id": "web"}] for web search, "file-parser", "response-healing", "context-compression" |
reasoning | object | {"effort": "high", "max_tokens": 5000} |
transforms | array | ["middle-out"] for context compression |
provider=... (and the others above) with TypeError: Completions.create() got an unexpected keyword argument 'provider'. Pass them via extra_body instead: ``python client.chat.completions.create( model="openai/gpt-5.4", messages=[...], extra_body={ "provider": {"sort": "throughput", "order": ["openai"]}, "models": ["openai/gpt-5.4-mini"], }, ) Raw HTTP / curl users can keep provider` at the top level — only the OpenAI SDK gates kwargs.Vision (Image Input)
Multimodal content (text + image parts) is accepted on any model whose
supports_vision flag is true in GET /v1/models. Models without vision
support reject image content with 400 unsupported_image_input before the
upstream call, so you're not charged.
from openai import OpenAI
client = OpenAI(api_key="cm_your_key", base_url="https://api.callmissed.com/v1")
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4.6", # supports_vision: true
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.png"}},
],
}],
)Current vision-capable models: openai/gpt-5.4-pro, openai/gpt-5.4,
openai/gpt-5.4-mini, openai/gpt-5.4-nano, anthropic/claude-opus-4.6,
anthropic/claude-sonnet-4.6, anthropic/claude-haiku-4.5,
google/gemini-3.1-pro-preview, google/gemini-3-flash-preview, google/gemini-3.5-flash, google/gemini-3.1-flash-lite,
x-ai/grok-4.20, qwen/qwen3.5-plus, qwen/qwen3.5-flash, kimi-k2.5,
kimi-k2.6, kimi-k2.7-code, gemma-4-26b-a4b-it, mistral-small-3.1,
mistralai/mistral-small-2603, auto (free plan), openrouter/auto.
Check the live GET /v1/models response for the authoritative list — it's
computed from the same set the runtime guard uses.
Context Window
Every model in the catalog advertises a context_window (token count for the
combined prompt + completion). The GET /v1/models response exposes it under
two keys for cross-client compatibility:
context_window(OpenAI/CallMissed canonical name)context_length(OpenAI SDK convention — same value)
from openai import OpenAI
client = OpenAI(api_key="cm_your_key", base_url="https://api.callmissed.com/v1")
for m in client.models.list():
extra = m.model_extra or {}
print(m.id, extra.get("context_window"), extra.get("supports_vision"))Representative context windows (treat GET /v1/models as the authoritative
source — the table below is a snapshot):
| Model | context_window |
|---|---|
openai/gpt-5.4, openai/gpt-5.4-pro, openai/gpt-5.4-mini, openai/gpt-5.4-nano | 1,048,576 |
anthropic/claude-opus-4.6, anthropic/claude-sonnet-4.6 | 1,048,576 |
google/gemini-3.1-pro-preview, google/gemini-3-flash-preview, google/gemini-3.5-flash, google/gemini-3.1-flash-lite | 1,048,576 |
nemotron-3-super | 1,048,576 |
x-ai/grok-4.20 | 262,144 |
qwen/qwen3.5-plus, qwen/qwen3.5-flash | 262,144 |
kimi-k2.5, kimi-k2.5-fast, kimi-k2.6, kimi-k2.7-code, glm-5.2 | 262,144 |
sarvam-105b, gpt-oss-120b, glm-4.7-flash, gemma-4-26b-a4b-it, mistralai/mistral-small-2603 | 131,072 |
sarvam-30b | 65,536 |
Responses API
For clients built on OpenAI's newer Responses API, CallMissed exposes a compatible POST /v1/responses endpoint. It accepts a Responses-shaped body and translates to the same chat engine under the hood — so you can point an OpenAI Responses client at https://api.callmissed.com/v1 without changes.
Endpoint: POST /v1/responses
from openai import OpenAI
client = OpenAI(api_key="cm_your_key", base_url="https://api.callmissed.com/v1")
resp = client.responses.create(
model="gpt-4.1",
input="Write a haiku about databases.",
)
print(resp.output_text)inputaccepts a plain string or the Responses message-array form.- Streaming is supported (
stream: true) and emits Responses-style SSE events. - The same models, pricing, vision, and tool-calling support as
/v1/chat/completionsapply — this is a request/response-shape adapter, not a different model set.
If you're starting fresh, /v1/chat/completions is the most widely-supported surface; use /v1/responses when porting an existing Responses-API integration.
Error Format
All errors return OpenAI-compatible format:
{
"error": {
"message": "Invalid API key",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}