Latency and Performance
Practical ways to keep TheRouter.ai responses fast
TheRouter.ai adds minimal gateway overhead, but total latency is still shaped by model choice, provider health, prompt size, and cache behavior.
Reference payload
Use this baseline request shape and adapt model, provider sort strategy, and token limits to your workload.
request.json
curl https://api.therouter.ai/v1/chat/completions -H "Authorization: Bearer $THEROUTER_API_KEY" -H "Content-Type: application/json" -d '{"model":"openrouter/auto:floor","messages":[{"role":"user","content":"ping"}]}'Configuration examples
TheRouter.ai keeps request semantics consistent across providers, so you can tune behavior without rewriting your app layer.
TypeScript
const res = await fetch("https://api.therouter.ai/v1/chat/completions", {
method: "POST",
headers: { Authorization: "Bearer <THEROUTER_API_KEY>", "Content-Type": "application/json" },
body: JSON.stringify({
model: "openrouter/auto:nitro",
provider: { sort: "throughput", allow_fallbacks: true },
messages: [{ role: "user", content: "Summarize in 3 bullets" }],
}),
});Production note
Operate with guardrails
Low credit balance and cold caches can temporarily increase latency. Keep account balance healthy and warm key endpoints after deploys.
Use the activity feed and usage exports to validate that these settings improve reliability and cost in your real traffic mix.