Cost Optimization

Cut AI inference costs by up to 85%

TheRouter routes AI requests to the most cost-effective provider automatically. SiliconFlow provides DeepSeek R1 at $0.55/$2.19 per MTok versus Bedrock's $3.50/$17.50 — 84% cheaper. DeepSeek V3.2 costs $0.14/$0.28 versus $0.90/$2.70 on Bedrock — 85% cheaper. No code changes required. The standard model name stays the same.

TheRouter automatically routes to the lowest-cost provider for each model. When SiliconFlow is available for DeepSeek and Qwen models, your costs drop 40–85% versus US-region providers — with automatic failover and zero code changes.

Real cost savings — live data

SiliconFlow prices vs AWS Bedrock for the same models. Prices in $/MTok (input / output).

ModelBedrock (primary)SiliconFlow (optimized)Savings
deepseek/deepseek-r1$3.50 / $17.50$0.55 / $2.1984%
deepseek/deepseek-v3.2$0.90 / $2.70$0.14 / $0.2885%
qwen/qwen3-235b$0.47 / $1.39$0.14 / $0.5560%
qwen/qwen3-32b$0.20 / $0.80$0.03 / $0.0785%

Prices as of March 2026. Bedrock on-demand pricing. View full pricing →

How cost routing works

1

Your request arrives

You call api.therouter.ai/v1/chat/completions with a standard model ID like deepseek/deepseek-v3.2.

2

Route selection

TheRouter checks provider health and cost. If SiliconFlow is healthy, it routes there at priority 1 (lower cost). Bedrock at priority 0 is the primary fallback.

3

Normalized response

The response arrives with the standard model name in the model field — never the provider-specific name. reasoning_content is preserved for DeepSeek R1.

Zero code changes

Switch from OpenAI to cost-optimized routing with one line change:

# Before: OpenAI directly
# client = OpenAI(api_key="sk-...", base_url="https://api.openai.com/v1")

# After: TheRouter with cost optimization
client = OpenAI(
    api_key="YOUR_THEROUTER_API_KEY",  # ← only change
    base_url="https://api.therouter.ai/v1",
)

# Same code, 85% lower cost for DeepSeek models
response = client.chat.completions.create(
    model="deepseek/deepseek-v3.2",
    messages=[{"role": "user", "content": "Summarize this document."}],
)

Common questions

How does TheRouter reduce AI inference costs?

TheRouter maintains multiple provider routes for each model and automatically selects the most cost-effective one. For shared models like DeepSeek and Qwen, SiliconFlow's China-region infrastructure offers 40–85% lower pricing than US providers like AWS Bedrock.

Will my application break when the provider switches?

No. TheRouter normalizes responses across all providers to a consistent OpenAI-compatible format. The model name in the response always reflects the standard model ID you requested — never the internal provider or upstream model name.

Can I control which provider is used?

Yes. You can set a provider preference per API key from the dashboard — Auto (cost-optimized routing), US-optimized (Bedrock primary), or China-optimized (SiliconFlow primary). The default auto mode picks the lowest-cost available route.

Does cost optimization affect response quality?

No. TheRouter routes to the same model via a different inference provider. The model weights and capabilities are identical — only the infrastructure differs. You get the same DeepSeek R1 output whether it runs on Bedrock or SiliconFlow.