Speech to Text
REAL-TIME STT
Real-time speech-to-text transcription via WebSocket.
Overview
Real-time STT is available through the Voice Agent WebSocket pipeline. Audio is streamed as PCM s16le 16kHz mono, and transcripts are returned in real-time as the user speaks.
There is no standalone real-time STT WebSocket endpoint — real-time transcription is part of the full Voice Agent pipeline (STT → LLM → TTS).
For file-based transcription, use the Speech to Text REST API.
Via Voice Agent
Connect to WS /ws/voice-agent, send audio chunks, and receive transcript messages:
json
{"type": "transcript", "text": "Hello, how are you?", "is_final": true}Example
const ws = new WebSocket(
"wss://api.callmissed.com/ws/voice-agent?api_key=cm_your_key"
);
ws.onopen = () => {
// Send configuration
ws.send(JSON.stringify({
type: "config",
bot_id: "your-bot-id",
stt_language: "hi-IN",
tts_voice: "shubh",
}));
// Stream audio from microphone
navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
const recorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
recorder.ondataavailable = (e) => ws.send(e.data);
recorder.start(250); // send chunks every 250ms
});
};
ws.onmessage = (event) => {
if (typeof event.data === "string") {
const msg = JSON.parse(event.data);
if (msg.type === "transcript") {
console.log("User said:", msg.text);
} else if (msg.type === "llm_token") {
process.stdout.write(msg.token);
}
} else {
// Binary data = TTS audio chunk (MP3)
playAudio(event.data);
}
};See the Voice Agent page for the full WebSocket protocol and all message types.
Was this page helpful?