DeepSeek V4 Now Available on TheRouter — Direct API Integration
DeepSeek released V4 Flash and V4 Pro today — their most powerful open-source models to date. Both are already live on TheRouter with day-one support.
TheRouter adds day-one support for DeepSeek V4 Flash and V4 Pro via direct DeepSeek API integration. V4 Flash: 284B MoE with 13B active parameters, 1M context, 384K max output, $0.14/$0.28 per MTok. V4 Pro: 1.6T MoE with 49B active parameters, 1M context, 384K max output, $1.74/$3.48 per MTok. Both models feature Hybrid Attention Architecture and Engram conditional memory. Apache 2.0 licensed with weights on Hugging Face. Model IDs: deepseek/deepseek-v4-flash, deepseek/deepseek-v4-pro.
V4 Flash — Best Value for Everyday Tasks
- 284B MoE, 13B active — Mixture of Experts architecture with only 13B parameters active per forward pass, keeping inference fast and cost low.
- 1M context, 384K max output — process entire codebases or long documents in a single request with massive output capacity.
- Default thinking mode — built-in chain-of-thought reasoning enabled by default for better accuracy.
- $0.14 / $0.28 per MTok (input/output) — among the most affordable reasoning models available.
V4 Pro — Complex Reasoning Powerhouse
- 1.6T MoE, 49B active — the largest open-source MoE model, approaching Claude Opus 4.6 non-thinking level performance.
- 1M context, 384K max output — same generous context and output limits as V4 Flash.
- $1.74 / $3.48 per MTok (input/output) — competitive pricing for a model at this capability level.
Benchmarks
| Benchmark | V4 Pro | V4 Flash | Claude Opus 4.6 |
|---|---|---|---|
| SWE-bench Verified | 80.6% | 79.0% | 80.8% |
| LiveCodeBench | 93.5 | — | — |
| Codeforces Rating | 3206 | — | — |
V4 Pro leads on LiveCodeBench (93.5) and achieves the highest Codeforces rating (3206) among all models. On SWE-bench Verified, it matches Claude Opus 4.6 within 0.2%.
Architecture
- Hybrid Attention Architecture — combines efficient attention mechanisms for handling both short and ultra-long sequences.
- Engram conditional memory — enables efficient processing of 1M context windows without proportional compute scaling.
- MoE with low active params — keeps inference costs dramatically lower than dense models of equivalent total parameter count.
Pricing
| Model | Input | Output | Context |
|---|---|---|---|
| V4 Flash | $0.14/MTok | $0.28/MTok | 1M |
| V4 Pro | $1.74/MTok | $3.48/MTok | 1M |
V4 Flash is one of the most cost-effective reasoning models available. V4 Pro offers frontier-level coding at a fraction of closed-source pricing.
How to Use It
Use the standard model names — TheRouter handles routing automatically:
curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [{"role": "user", "content": "Explain the MoE architecture"}],
"max_tokens": 4096
}'For V4 Pro, use deepseek/deepseek-v4-pro. Both models are available on the Global endpoint (api.therouter.ai) and the China endpoint (airouter-api.mizone.me).
Open Source
Both V4 Flash and V4 Pro are released under the Apache 2.0 license with full model weights available on Hugging Face. You can self-host, fine-tune, or use them commercially without restrictions.
Getting Started
Already on TheRouter? Just set the model to deepseek/deepseek-v4-flash or deepseek/deepseek-v4-pro — no other changes needed.
Questions? Reach out on GitHub.