Skip to main content

Getting Started > Models

KIMI K2.5 FAST (MAINTENANCE)

High-throughput Kimi K2.5 inference tier — currently under maintenance. Use kimi-k2.5 in the meantime.

> Under maintenance. kimi-k2.5-fast is temporarily unavailable. Requests return HTTP 503 with code: "model_under_maintenance". Use `kimi-k2.5` for production traffic; both ride on the same Kimi K2.5 model from Moonshot AI.

Overview

The kimi-k2.5-fast tier targets ultra-low-latency voice-agent workloads via a high-throughput inference partner. While it's under maintenance, route the same workload through kimi-k2.5 — the model and tokeniser are identical, only the inference latency differs.

Kimi K2.5 Fast

FieldValue
Model IDkimi-k2.5-fast
StatusUnder maintenance — returns 503
Recommended fallbackkimi-k2.5
ArchitectureMoE (Mixture of Experts)
Context window262,144 tokens
Supports streamingYes
Supports toolsYes

Kimi K2.5 (by Moonshot AI) is a 1T-parameter MoE model with 32B active parameters. It excels at reasoning, coding, and multilingual tasks.

Usage

While kimi-k2.5-fast is in maintenance, point your code at kimi-k2.5:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.callmissed.com/v1",
    api_key="cm_your_api_key",
)

response = client.chat.completions.create(
    model="kimi-k2.5",  # kimi-k2.5-fast is under maintenance
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing briefly."},
    ],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Pricing

DirectionCost per 1M tokens
Input$0.52
Output$2.30

Credits: 1 credit = $0.01. A typical voice agent turn (500 input + 200 output tokens) costs approximately 0.07 credits. Pricing applies once kimi-k2.5-fast returns from maintenance; kimi-k2.5 is priced separately on its model page.

Was this page helpful?