Latency and Performance

Practical ways to keep TheRouter.ai responses fast

TheRouter.ai adds minimal gateway overhead, but total latency is still shaped by model choice, provider health, prompt size, and cache behavior.

Reference payload

Use this baseline request shape and adapt model, provider sort strategy, and token limits to your workload.

request.json

curl https://api.therouter.ai/v1/chat/completions   -H "Authorization: Bearer $THEROUTER_API_KEY"   -H "Content-Type: application/json"   -d '{"model":"openrouter/auto:floor","messages":[{"role":"user","content":"ping"}]}'

Configuration examples

TheRouter.ai keeps request semantics consistent across providers, so you can tune behavior without rewriting your app layer.

TypeScript

const res = await fetch("https://api.therouter.ai/v1/chat/completions", {
  method: "POST",
  headers: { Authorization: "Bearer <THEROUTER_API_KEY>", "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "openrouter/auto:nitro",
    provider: { sort: "throughput", allow_fallbacks: true },
    messages: [{ role: "user", content: "Summarize in 3 bullets" }],
  }),
});

Production note

Operate with guardrails

Low credit balance and cold caches can temporarily increase latency. Keep account balance healthy and warm key endpoints after deploys.

Use the activity feed and usage exports to validate that these settings improve reliability and cost in your real traffic mix.