Skip to main content

Speech to Text

REAL-TIME STT

Real-time speech-to-text transcription via WebSocket.

Overview

Real-time STT is available through the Voice Agent WebSocket pipeline. Audio is streamed as PCM s16le 16kHz mono, and transcripts are returned in real-time as the user speaks.

There is no standalone real-time STT WebSocket endpoint — real-time transcription is part of the full Voice Agent pipeline (STT → LLM → TTS).

For file-based transcription, use the Speech to Text REST API.

Via Voice Agent

Connect to WS /ws/voice-agent, send audio chunks, and receive transcript messages:

json
{"type": "transcript", "text": "Hello, how are you?", "is_final": true}

Example

const ws = new WebSocket(
  "wss://api.callmissed.com/ws/voice-agent?api_key=cm_your_key"
);

ws.onopen = () => {
  // Send configuration
  ws.send(JSON.stringify({
    type: "config",
    bot_id: "your-bot-id",
    stt_language: "hi-IN",
    tts_voice: "shubh",
  }));

  // Stream audio from microphone
  navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
    const recorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
    recorder.ondataavailable = (e) => ws.send(e.data);
    recorder.start(250); // send chunks every 250ms
  });
};

ws.onmessage = (event) => {
  if (typeof event.data === "string") {
    const msg = JSON.parse(event.data);
    if (msg.type === "transcript") {
      console.log("User said:", msg.text);
    } else if (msg.type === "llm_token") {
      process.stdout.write(msg.token);
    }
  } else {
    // Binary data = TTS audio chunk (MP3)
    playAudio(event.data);
  }
};

See the Voice Agent page for the full WebSocket protocol and all message types.

Was this page helpful?