Skip to main content

Getting Started

Rate Limits & Quotas

How CallMissed limits request rate — global per-IP limits, per-key RPM, response headers, and how to handle 429s.

Limit Layers

Requests pass through several limits, in order:

LayerLimitScope
Global middleware200 requests / minuteper IP
Auth endpoints5–10 requests / minuteper IP (login, register, refresh, OTP)
Per-key RPMplan defaults: Free 60 · Starter 500 · Pro 3,000 · Enterprise 10,000 (override per key)per API key
Monthly budgetconfigurable credit capper tenant / per key
Plan limitstier-based caps on LLM/STT/TTS calls, conversations, storage, team sizeper tenant

Set a per-key RPM and a budget cap when issuing keys, and check live consumption with GET /api/v1/keys/:id/rate-state.

Response Headers

Rate-limited responses include standard headers so you can pace requests:

HeaderMeaning
Retry-AfterSeconds to wait before retrying (on 429)
X-RateLimit-LimitThe ceiling for the current window
X-RateLimit-RemainingRequests left in the window

Handling 429

When you receive 429 Too Many Requests:

  1. Read Retry-After and wait at least that long.
  2. Use exponential backoff with jitter for repeated 429s.
  3. Spread bursty workloads across time, or request a higher per-key RPM.

See Error Codes for the full status/code reference.

Was this page helpful?