·New Models·中文版本 →

DeepSeek V4 Now Available on TheRouter — Direct API Integration

DeepSeek released V4 Flash and V4 Pro today — their most powerful open-source models to date. Both are already live on TheRouter with day-one support.


TheRouter adds day-one support for DeepSeek V4 Flash and V4 Pro via direct DeepSeek API integration. V4 Flash: 284B MoE with 13B active parameters, 1M context, 384K max output, $0.14/$0.28 per MTok. V4 Pro: 1.6T MoE with 49B active parameters, 1M context, 384K max output, $1.74/$3.48 per MTok. Both models feature Hybrid Attention Architecture and Engram conditional memory. Apache 2.0 licensed with weights on Hugging Face. Model IDs: deepseek/deepseek-v4-flash, deepseek/deepseek-v4-pro.

V4 Flash — Best Value for Everyday Tasks

  • 284B MoE, 13B active — Mixture of Experts architecture with only 13B parameters active per forward pass, keeping inference fast and cost low.
  • 1M context, 384K max output — process entire codebases or long documents in a single request with massive output capacity.
  • Default thinking mode — built-in chain-of-thought reasoning enabled by default for better accuracy.
  • $0.14 / $0.28 per MTok (input/output) — among the most affordable reasoning models available.

V4 Pro — Complex Reasoning Powerhouse

  • 1.6T MoE, 49B active — the largest open-source MoE model, approaching Claude Opus 4.6 non-thinking level performance.
  • 1M context, 384K max output — same generous context and output limits as V4 Flash.
  • $1.74 / $3.48 per MTok (input/output) — competitive pricing for a model at this capability level.

Benchmarks

BenchmarkV4 ProV4 FlashClaude Opus 4.6
SWE-bench Verified80.6%79.0%80.8%
LiveCodeBench93.5
Codeforces Rating3206

V4 Pro leads on LiveCodeBench (93.5) and achieves the highest Codeforces rating (3206) among all models. On SWE-bench Verified, it matches Claude Opus 4.6 within 0.2%.

Architecture

  • Hybrid Attention Architecture — combines efficient attention mechanisms for handling both short and ultra-long sequences.
  • Engram conditional memory — enables efficient processing of 1M context windows without proportional compute scaling.
  • MoE with low active params — keeps inference costs dramatically lower than dense models of equivalent total parameter count.

Pricing

ModelInputOutputContext
V4 Flash$0.14/MTok$0.28/MTok1M
V4 Pro$1.74/MTok$3.48/MTok1M

V4 Flash is one of the most cost-effective reasoning models available. V4 Pro offers frontier-level coding at a fraction of closed-source pricing.

How to Use It

Use the standard model names — TheRouter handles routing automatically:

curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Explain the MoE architecture"}],
    "max_tokens": 4096
  }'

For V4 Pro, use deepseek/deepseek-v4-pro. Both models are available on the Global endpoint (api.therouter.ai) and the China endpoint (airouter-api.mizone.me).

Open Source

Both V4 Flash and V4 Pro are released under the Apache 2.0 license with full model weights available on Hugging Face. You can self-host, fine-tune, or use them commercially without restrictions.

Getting Started

Already on TheRouter? Just set the model to deepseek/deepseek-v4-flash or deepseek/deepseek-v4-pro — no other changes needed.


Questions? Reach out on GitHub.