Chat Completion
STREAMING
Stream chat completion responses in real-time using Server-Sent Events.
Overview
Enable streaming by setting "stream": true. The response is a Server-Sent Events (SSE) stream with Content-Type: text/event-stream.
SSE Format
Each event is a line starting with data: followed by a JSON chunk:
bash
data: {"id":"...","choices":[{"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"...","choices":[{"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"...","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]- First chunk always includes
{"delta": {"role": "assistant", "content": ""}} - Content chunks carry
{"delta": {"content": "token"}} - Final chunk has
{"delta": {}, "finish_reason": "stop"} - End marker is
data: [DONE]
Usage in Stream
To get token usage in the stream, set stream_options: {"include_usage": true}. A final chunk with a usage field is sent before [DONE]:
json
data: {"id":"...","choices":[],"usage":{"prompt_tokens":12,"completion_tokens":34,"total_tokens":46}}
data: [DONE]Code Example
from openai import OpenAI
client = OpenAI(
api_key="cm_your_key",
base_url="https://api.callmissed.com/v1"
)
stream = client.chat.completions.create(
model="sarvam-30b",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
stream_options={"include_usage": True}
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Was this page helpful?