April 24, 2026·New Providers·中文版本 →

Groq & Mistral Now Live on TheRouter — Two New Providers, 15 New Models

TheRouter now integrates Groq and Mistral as new providers — bringing our total to 9 integrated providers and adding 15 new models to the routing mesh. Both are live now with full OpenAI-compatible API support.

Groq — The Fastest AI Inference

Groq builds custom silicon purpose-built for language model inference. Their LPU (Language Processing Unit) delivers sub-100ms time-to-first-token and sustained throughput above 300 tokens per second — making it the fastest inference platform available today.

Custom LPU silicon — purpose-built hardware that eliminates the memory bandwidth bottleneck of GPU-based inference, delivering deterministic low-latency responses.
Sub-100ms TTFT, 300+ tok/sec — ideal for latency-sensitive applications where every millisecond matters: real-time chat, interactive coding assistants, and rapid prototyping.
4 models at launch — Llama 4 Scout (109B MoE, 16 experts), Llama 3.3 70B, Llama 3.1 8B, and Qwen3 32B cover the full range from lightweight to frontier-capable.
Best for — latency-sensitive apps, real-time chat interfaces, rapid prototyping, and any workflow where speed-to-response is the primary constraint.

Groq Pricing

Model	Input	Output	Context
Llama 4 Scout	$0.11/MTok	$0.34/MTok	128K
Llama 3.3 70B	$0.59/MTok	$0.79/MTok	128K
Llama 3.1 8B	$0.05/MTok	$0.08/MTok	128K
Qwen3 32B	$0.29/MTok	$0.39/MTok	128K

Groq pricing reflects the speed premium — you pay slightly more per token but get responses in a fraction of the time. For throughput-bound workloads, the wall-clock savings often outweigh the per-token cost.

Mistral — European AI Excellence

Paris-based Mistral has built one of the most comprehensive open-weight model families in the industry. From their 675B flagship to efficient 3B edge models, Mistral covers coding, reasoning, multilingual, and vision — all under permissive licenses with EU data sovereignty considerations.

Mistral Large 3 (675B) — flagship model rivaling GPT-4-class performance across reasoning, coding, and multilingual tasks with native tool use and JSON mode.
Devstral 2 — purpose-built for software engineering with agentic coding, multi-file editing, and deep codebase understanding.
Magistral Small — reasoning specialist with extended thinking capabilities for math, logic, and step-by-step problem solving.
Codestral (256K context) — dedicated code generation model with the largest context window in its class, supporting 80+ programming languages.
Ministral family (3B / 8B / 14B) — compact models optimized for edge deployment, on-device inference, and cost-sensitive batch processing.
Pixtral Large — multimodal vision model for document understanding, chart analysis, and visual reasoning tasks.
11 models total — the broadest single-provider lineup we have added, covering general-purpose, coding, reasoning, vision, and edge deployment use cases.

Mistral Pricing

Model	Input	Output	Context
Mistral Large 3	$2.00/MTok	$6.00/MTok	128K
Devstral 2	$0.50/MTok	$1.50/MTok	128K
Magistral Small	$0.50/MTok	$1.50/MTok	40K
Codestral	$0.30/MTok	$0.90/MTok	256K
Mistral Small	$0.10/MTok	$0.30/MTok	32K
Pixtral Large	$2.00/MTok	$6.00/MTok	128K
Ministral 3B	$0.04/MTok	$0.10/MTok	128K
Ministral 8B	$0.10/MTok	$0.10/MTok	128K
Ministral 14B	$0.15/MTok	$0.30/MTok	128K

Mistral pricing spans two orders of magnitude — from $0.04/MTok input on Ministral 3B to $2.00/MTok on Large 3 and Pixtral Large. Pick the model that matches your task complexity and budget.

Why This Matters for Routing

Adding Groq and Mistral is not just about more models — it fundamentally expands what TheRouter can optimize for. Groq's raw speed and Mistral's breadth complement our existing providers in meaningful ways:

Same API, more options — point your client to api.therouter.ai and set the model name. No SDK changes, no new authentication flows.
Health-aware routing — if Groq experiences an outage, TheRouter automatically routes your Llama requests to an alternative provider serving the same model. You get resilience without writing failover logic.
Speed where it matters — use Groq-hosted models for latency-critical paths and Mistral or other providers for throughput-heavy batch workloads. TheRouter lets you mix providers behind a single API key.

How to Use Them

Use the standard model names — TheRouter handles routing automatically:

# Groq-hosted Llama 4 Scout
curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-4-scout",
    "messages": [{"role": "user", "content": "Explain the MoE architecture"}],
    "max_tokens": 4096
  }'

# Mistral Large 3
curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral/mistral-large-3",
    "messages": [{"role": "user", "content": "Write a Python web scraper"}],
    "max_tokens": 4096
  }'

All 15 new models are available on the Global endpoint (api.therouter.ai) and the China endpoint (airouter-api.mizone.me).

Getting Started

Already on TheRouter? Just set the model name — no other changes needed. New to TheRouter? Sign up and get an API key in under a minute.

Start for free Quickstart guide Browse all models

Questions? Reach out on GitHub.