·New Providers·中文版本 →

Groq & Mistral Now Live on TheRouter — Two New Providers, 15 New Models

TheRouter now integrates Groq and Mistral as new providers — bringing our total to 9 integrated providers and adding 15 new models to the routing mesh. Both are live now with full OpenAI-compatible API support.


TheRouter adds Groq and Mistral as new providers. Groq offers LPU-powered inference with sub-100ms time-to-first-token and 300+ tok/sec throughput. Models include Llama 4 Scout, Llama 3.3 70B, Llama 3.1 8B, and Qwen3 32B. Mistral brings 11 models including Mistral Large 3 (675B), Devstral 2, Magistral Small, Codestral with 256K context, Mistral Small, Ministral 3B/8B/14B, and Pixtral Large. All models available through api.therouter.ai with the same OpenAI-compatible API.

Groq — The Fastest AI Inference

Groq builds custom silicon purpose-built for language model inference. Their LPU (Language Processing Unit) delivers sub-100ms time-to-first-token and sustained throughput above 300 tokens per second — making it the fastest inference platform available today.

  • Custom LPU silicon — purpose-built hardware that eliminates the memory bandwidth bottleneck of GPU-based inference, delivering deterministic low-latency responses.
  • Sub-100ms TTFT, 300+ tok/sec — ideal for latency-sensitive applications where every millisecond matters: real-time chat, interactive coding assistants, and rapid prototyping.
  • 4 models at launch — Llama 4 Scout (109B MoE, 16 experts), Llama 3.3 70B, Llama 3.1 8B, and Qwen3 32B cover the full range from lightweight to frontier-capable.
  • Best for — latency-sensitive apps, real-time chat interfaces, rapid prototyping, and any workflow where speed-to-response is the primary constraint.

Groq Pricing

ModelInputOutputContext
Llama 4 Scout$0.11/MTok$0.34/MTok128K
Llama 3.3 70B$0.59/MTok$0.79/MTok128K
Llama 3.1 8B$0.05/MTok$0.08/MTok128K
Qwen3 32B$0.29/MTok$0.39/MTok128K

Groq pricing reflects the speed premium — you pay slightly more per token but get responses in a fraction of the time. For throughput-bound workloads, the wall-clock savings often outweigh the per-token cost.


Mistral — European AI Excellence

Paris-based Mistral has built one of the most comprehensive open-weight model families in the industry. From their 675B flagship to efficient 3B edge models, Mistral covers coding, reasoning, multilingual, and vision — all under permissive licenses with EU data sovereignty considerations.

  • Mistral Large 3 (675B) — flagship model rivaling GPT-4-class performance across reasoning, coding, and multilingual tasks with native tool use and JSON mode.
  • Devstral 2 — purpose-built for software engineering with agentic coding, multi-file editing, and deep codebase understanding.
  • Magistral Small — reasoning specialist with extended thinking capabilities for math, logic, and step-by-step problem solving.
  • Codestral (256K context) — dedicated code generation model with the largest context window in its class, supporting 80+ programming languages.
  • Ministral family (3B / 8B / 14B) — compact models optimized for edge deployment, on-device inference, and cost-sensitive batch processing.
  • Pixtral Large — multimodal vision model for document understanding, chart analysis, and visual reasoning tasks.
  • 11 models total — the broadest single-provider lineup we have added, covering general-purpose, coding, reasoning, vision, and edge deployment use cases.

Mistral Pricing

ModelInputOutputContext
Mistral Large 3$2.00/MTok$6.00/MTok128K
Devstral 2$0.50/MTok$1.50/MTok128K
Magistral Small$0.50/MTok$1.50/MTok40K
Codestral$0.30/MTok$0.90/MTok256K
Mistral Small$0.10/MTok$0.30/MTok32K
Pixtral Large$2.00/MTok$6.00/MTok128K
Ministral 3B$0.04/MTok$0.10/MTok128K
Ministral 8B$0.10/MTok$0.10/MTok128K
Ministral 14B$0.15/MTok$0.30/MTok128K

Mistral pricing spans two orders of magnitude — from $0.04/MTok input on Ministral 3B to $2.00/MTok on Large 3 and Pixtral Large. Pick the model that matches your task complexity and budget.


Why This Matters for Routing

Adding Groq and Mistral is not just about more models — it fundamentally expands what TheRouter can optimize for. Groq's raw speed and Mistral's breadth complement our existing providers in meaningful ways:

  • Same API, more options — point your client to api.therouter.ai and set the model name. No SDK changes, no new authentication flows.
  • Health-aware routing — if Groq experiences an outage, TheRouter automatically routes your Llama requests to an alternative provider serving the same model. You get resilience without writing failover logic.
  • Speed where it matters — use Groq-hosted models for latency-critical paths and Mistral or other providers for throughput-heavy batch workloads. TheRouter lets you mix providers behind a single API key.

How to Use Them

Use the standard model names — TheRouter handles routing automatically:

# Groq-hosted Llama 4 Scout
curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-4-scout",
    "messages": [{"role": "user", "content": "Explain the MoE architecture"}],
    "max_tokens": 4096
  }'

# Mistral Large 3
curl https://api.therouter.ai/v1/chat/completions \
  -H "Authorization: Bearer $THE_ROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral/mistral-large-3",
    "messages": [{"role": "user", "content": "Write a Python web scraper"}],
    "max_tokens": 4096
  }'

All 15 new models are available on the Global endpoint (api.therouter.ai) and the China endpoint (airouter-api.mizone.me).

Getting Started

Already on TheRouter? Just set the model name — no other changes needed. New to TheRouter? Sign up and get an API key in under a minute.


Questions? Reach out on GitHub.