Groq & Mistral Now Live on TheRouter — Two New Providers, 15 New Models
TheRouter now integrates Groq and Mistral as new providers — bringing our total to 9 integrated providers and adding 15 new models to the routing mesh. Both are live now with full OpenAI-compatible API support.
TheRouter adds Groq and Mistral as new providers. Groq offers LPU-powered inference with sub-100ms time-to-first-token and 300+ tok/sec throughput. Models include Llama 4 Scout, Llama 3.3 70B, Llama 3.1 8B, and Qwen3 32B. Mistral brings 11 models including Mistral Large 3 (675B), Devstral 2, Magistral Small, Codestral with 256K context, Mistral Small, Ministral 3B/8B/14B, and Pixtral Large. All models available through api.therouter.ai with the same OpenAI-compatible API.
Groq — The Fastest AI Inference
Groq builds custom silicon purpose-built for language model inference. Their LPU (Language Processing Unit) delivers sub-100ms time-to-first-token and sustained throughput above 300 tokens per second — making it the fastest inference platform available today.
- Custom LPU silicon — purpose-built hardware that eliminates the memory bandwidth bottleneck of GPU-based inference, delivering deterministic low-latency responses.
- Sub-100ms TTFT, 300+ tok/sec — ideal for latency-sensitive applications where every millisecond matters: real-time chat, interactive coding assistants, and rapid prototyping.
- 4 models at launch — Llama 4 Scout (109B MoE, 16 experts), Llama 3.3 70B, Llama 3.1 8B, and Qwen3 32B cover the full range from lightweight to frontier-capable.
- Best for — latency-sensitive apps, real-time chat interfaces, rapid prototyping, and any workflow where speed-to-response is the primary constraint.
Groq Pricing
| Model | Input | Output | Context |
|---|---|---|---|
| Llama 4 Scout | $0.11/MTok | $0.34/MTok | 128K |
| Llama 3.3 70B | $0.59/MTok | $0.79/MTok | 128K |
| Llama 3.1 8B | $0.05/MTok | $0.08/MTok | 128K |
| Qwen3 32B | $0.29/MTok | $0.39/MTok | 128K |
Groq pricing reflects the speed premium — you pay slightly more per token but get responses in a fraction of the time. For throughput-bound workloads, the wall-clock savings often outweigh the per-token cost.
Mistral — European AI Excellence
Paris-based Mistral has built one of the most comprehensive open-weight model families in the industry. From their 675B flagship to efficient 3B edge models, Mistral covers coding, reasoning, multilingual, and vision — all under permissive licenses with EU data sovereignty considerations.
- Mistral Large 3 (675B) — flagship model rivaling GPT-4-class performance across reasoning, coding, and multilingual tasks with native tool use and JSON mode.
- Devstral 2 — purpose-built for software engineering with agentic coding, multi-file editing, and deep codebase understanding.
- Magistral Small — reasoning specialist with extended thinking capabilities for math, logic, and step-by-step problem solving.
- Codestral (256K context) — dedicated code generation model with the largest context window in its class, supporting 80+ programming languages.
- Ministral family (3B / 8B / 14B) — compact models optimized for edge deployment, on-device inference, and cost-sensitive batch processing.
- Pixtral Large — multimodal vision model for document understanding, chart analysis, and visual reasoning tasks.
- 11 models total — the broadest single-provider lineup we have added, covering general-purpose, coding, reasoning, vision, and edge deployment use cases.
Mistral Pricing
| Model | Input | Output | Context |
|---|---|---|---|
| Mistral Large 3 | $2.00/MTok | $6.00/MTok | 128K |
| Devstral 2 | $0.50/MTok | $1.50/MTok | 128K |
| Magistral Small | $0.50/MTok | $1.50/MTok | 40K |
| Codestral | $0.30/MTok | $0.90/MTok | 256K |
| Mistral Small | $0.10/MTok | $0.30/MTok | 32K |
| Pixtral Large | $2.00/MTok | $6.00/MTok | 128K |
| Ministral 3B | $0.04/MTok | $0.10/MTok | 128K |
| Ministral 8B | $0.10/MTok | $0.10/MTok | 128K |
| Ministral 14B | $0.15/MTok | $0.30/MTok | 128K |
Mistral pricing spans two orders of magnitude — from $0.04/MTok input on Ministral 3B to $2.00/MTok on Large 3 and Pixtral Large. Pick the model that matches your task complexity and budget.
Why This Matters for Routing
Adding Groq and Mistral is not just about more models — it fundamentally expands what TheRouter can optimize for. Groq's raw speed and Mistral's breadth complement our existing providers in meaningful ways:
- Same API, more options — point your client to
api.therouter.aiand set the model name. No SDK changes, no new authentication flows. - Health-aware routing — if Groq experiences an outage, TheRouter automatically routes your Llama requests to an alternative provider serving the same model. You get resilience without writing failover logic.
- Speed where it matters — use Groq-hosted models for latency-critical paths and Mistral or other providers for throughput-heavy batch workloads. TheRouter lets you mix providers behind a single API key.
How to Use Them
Use the standard model names — TheRouter handles routing automatically:
# Groq-hosted Llama 4 Scout
curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama-4-scout",
"messages": [{"role": "user", "content": "Explain the MoE architecture"}],
"max_tokens": 4096
}'
# Mistral Large 3
curl https://api.therouter.ai/v1/chat/completions \
-H "Authorization: Bearer $THE_ROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral/mistral-large-3",
"messages": [{"role": "user", "content": "Write a Python web scraper"}],
"max_tokens": 4096
}'All 15 new models are available on the Global endpoint (api.therouter.ai) and the China endpoint (airouter-api.mizone.me).
Getting Started
Already on TheRouter? Just set the model name — no other changes needed. New to TheRouter? Sign up and get an API key in under a minute.
Questions? Reach out on GitHub.