Vision API Guide

Use image inputs with Claude, GPT-4o, Gemini, and other vision-capable models through TheRouter's unified API.

Supported Models

Vision-capable models at TheRouter:

Anthropic: Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 (image, PDF)
OpenAI: GPT-5.4, GPT-5.2, GPT-4.1, o3, o4-mini (image)
Google: Gemini 3.1 Pro, Gemini 3 Flash, Gemini 2.5 Pro/Flash (image, PDF)
xAI: Grok 4, Grok 4.1 Fast (image)
Meta: Llama 4 Maverick/Scout, Llama 3.2 90B/11B (image)
Mistral: Mistral Large 3, Ministral 14B/8B/3B (image)

Image Formats

Supported formats:

JPEG/JPG
PNG
GIF
WebP

Note: Maximum image size varies by provider. Claude: 5MB per image. OpenAI: 20MB. Gemini: 20MB.

Examples

cURL - Image URL

curl https://api.therouter.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${THEROUTER_API_KEY}" \
  -d '{
    "model": "anthropic/claude-sonnet-4.6",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/image.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

cURL - Base64 Image

curl https://api.therouter.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${THEROUTER_API_KEY}" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
            }
          }
        ]
      }
    ],
    "max_tokens": 1024
  }'

Python - OpenAI SDK

from openai import OpenAI
import base64

client = OpenAI(
    api_key="your_therouter_api_key",
    base_url="https://api.therouter.ai/v1"
)

# From URL
response = client.chat.completions.create(
    model="google/gemini-3-flash-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.jpg"
                    }
                }
            ]
        }
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

# From local file (base64)
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image("path/to/image.jpg")

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze this image"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

TypeScript - OpenAI SDK

import OpenAI from "openai";
import * as fs from "fs";

const client = new OpenAI({
  apiKey: process.env.THEROUTER_API_KEY,
  baseURL: "https://api.therouter.ai/v1",
});

// From URL
const response = await client.chat.completions.create({
  model: "xai/grok-4-1-fast-reasoning",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "What is in this image?",
        },
        {
          type: "image_url",
          image_url: {
            url: "https://example.com/image.jpg",
          },
        },
      ],
    },
  ],
  max_tokens: 1024,
});

console.log(response.choices[0].message.content);

// From local file (base64)
function encodeImage(imagePath: string): string {
  const imageBuffer = fs.readFileSync(imagePath);
  return imageBuffer.toString("base64");
}

const base64Image = encodeImage("path/to/image.jpg");

const responseLocal = await client.chat.completions.create({
  model: "meta/llama-4-maverick",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Describe this image in detail",
        },
        {
          type: "image_url",
          image_url: {
            url: `data:image/jpeg;base64,${base64Image}`,
          },
        },
      ],
    },
  ],
  max_tokens: 1024,
});

console.log(responseLocal.choices[0].message.content);

Error Handling

Common errors when using vision:

400 Bad Request - Unsupported Model

{
  "error": {
    "message": "Model deepseek/deepseek-v3.2 does not support image_url content. Supported models with vision: anthropic/claude-opus-4.6, ...",
    "type": "invalid_request_error",
    "code": "multimodal_not_supported"
  }
}

Solution: Use a vision-capable model. Check the /v1/models endpoint for models with features: ["vision"].

400 Bad Request - Invalid Image Format

{
  "error": {
    "message": "Invalid image format. Supported formats: jpeg, png, gif, webp",
    "type": "invalid_request_error",
    "code": "invalid_image_format"
  }
}

Solution: Convert image to a supported format (JPEG, PNG, GIF, WebP).

413 Payload Too Large

Solution: Compress or resize the image. Maximum sizes: Claude 5MB, OpenAI 20MB, Gemini 20MB.

Best Practices

Image URLs: Use publicly accessible URLs or base64 encoding for private images
Multiple images: Most models support multiple images per request (Claude: 20, GPT-4o: 10, Gemini: 16)
Image quality: Higher resolution = more tokens consumed and higher cost
PDF support: Claude and Gemini models support PDF via base64 data URI
Model selection: Use cheaper models (Haiku, GPT-4.1-mini, Gemini Flash) for simple image tasks