Audio Inputs

Process speech and audio with TheRouter.ai-compatible models

Audio is sent as input_audio content with base64-encoded bytes. Direct URL audio inputs are not currently supported in chat completion content parts.

Encoding requirement
Audio files must be base64 encoded and include a format field such as wav or mp3.

Send audio for transcription

TypeScript
import fs from "node:fs/promises";

const audioBuffer = await fs.readFile("./meeting.wav");
const base64Audio = audioBuffer.toString("base64");

const response = await fetch("https://api.therouter.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: "Bearer <THEROUTER_API_KEY>",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "google/gemini-2.5-flash",
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: "Transcribe this audio and list action items." },
          {
            type: "input_audio",
            input_audio: {
              data: base64Audio,
              format: "wav",
            },
          },
        ],
      },
    ],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

Supported formats

audio-formats.txt
wav
mp3
aiff
aac
ogg
flac
m4a
pcm16
pcm24

Audio input and output patterns

TheRouter.ai can route audio-aware models for speech-to-text and broader audio reasoning tasks. For text-to- speech or dedicated transcription endpoints, follow provider/model-specific capabilities in the models catalog and API reference.

Capability check
Before sending audio workloads to production, verify the selected model supports audio input and the target task (transcription, translation, summarization, or response audio output).