LLM Resource Delivery Methods

A technical reference for sending images, audio, video, and documents to multimodal models through TheRouter. Covers all five delivery methods, provider support matrices, size limits, and practical code examples.

The Five Delivery Methods

LLM providers support different ways to reference a resource (an image, audio clip, video, or document) within a chat message. Understanding which methods each provider supports is critical for building robust multimodal applications.

1. Base64 Inline

The resource is base64-encoded and embedded directly in the request body as a data URI or structured field. This is the most universally supported method — every provider that accepts multimodal input supports base64.

"image_url": {
  "url": "data:image/jpeg;base64,/9j/4AAQSkZJRg..."
}

Pros: Works everywhere, no external dependency, no authentication needed for the resource itself.
Cons: Inflates request payload by ~33% (base64 encoding overhead), adds latency, hits body size limits faster.

2. HTTP URL Reference

The resource URL is passed directly; the provider fetches it server-side. The URL must be publicly accessible (no authentication, no private S3 signed URLs that expire before the provider fetches).

"image_url": {
  "url": "https://cdn.example.com/photo.jpg"
}

Pros: Minimal request size, no encoding overhead, easy for images already hosted publicly.
Cons: Not supported by Amazon Bedrock (hard requirement: base64 or S3 only). Provider fetches add latency. URL must remain accessible until the request is processed.

3. File Upload (file_id)

Upload a file once via a provider's Files API, receive a file_id, then reference that ID in subsequent requests. Avoids re-transmitting the same resource in multi-turn conversations.

// After uploading: POST /v1/files
"image_url": {
  "url": "file-abc123"
}

Pros: Efficient for repeated use of the same resource across multiple turns. Anthropic Files API files persist indefinitely (no expiry). OpenAI files persist until deleted.
Cons: Requires a separate upload step. Anthropic Files API is currently in beta (requires anthropic-beta: files-api-2025-04-14 header). Not available on Bedrock or Vertex AI channels. Gemini Files API files expire after 48 hours.

4. Cloud Storage URI

Reference a file in a cloud storage bucket by its native URI. Each provider uses their own storage ecosystem: Amazon S3 for Bedrock, Google Cloud Storage for Gemini.

// Amazon S3 (Bedrock)
"source": { "s3Location": { "uri": "s3://my-bucket/image.jpg" } }

// Google Cloud Storage (Gemini)
"fileData": { "mimeType": "image/jpeg", "fileUri": "gs://my-bucket/image.jpg" }

Pros: Eliminates base64 encoding overhead. Ideal for large files (video, multi-page PDFs) where inline base64 would exceed payload limits. No expiry (unlike Gemini Files API).
Cons: Requires pre-provisioned cloud storage in the provider's ecosystem with appropriate IAM permissions. Not portable across providers. Adds operational complexity.

5. YouTube URL

Only supported by Google Gemini for video input. Pass a public YouTube video URL directly; Gemini processes the video without requiring download or upload.

"fileData": {
  "mimeType": "video/youtube",
  "fileUri": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Pros: Zero-cost delivery for publicly available videos. No storage required. Up to 10 videos per request (Gemini 2.5+).
Cons: Gemini-only. Public videos only — private or unlisted videos are rejected. Cannot be used in batch mode.

Provider Support Matrix

Which delivery methods each provider accepts, by resource type:

Images

ProviderBase64HTTP URLFile Upload (file_id)Cloud Storage URIMax SizeFormats
OpenAIYESYESYES (Files API)NO20 MBJPEG, PNG, WebP, GIF
AnthropicYESYESYES (beta)NO5 MBJPEG, PNG, WebP, GIF
Google GeminiYESYESYES (Files API)YES (GCS)20 MB inline / 2 GB Files APIJPEG, PNG, WebP, HEIC, HEIF
Amazon BedrockYESNONOYES (S3)3.75 MBJPEG, PNG, WebP, GIF
xAI GrokYESYESNONO20 MiBJPEG, PNG
Mistral (Pixtral)YESYESNONO~50 MBJPEG, PNG, WebP, GIF
Cohere (Command A)YESYESNONO20 MB totalJPEG, PNG, WebP, GIF

Audio

ProviderBase64HTTP URLFile UploadMax SizeFormats
OpenAI (Chat Completions)YESNONO~20 MBWAV, MP3
OpenAI (Whisper API)NONOYES (multipart)25 MBMP3, MP4, MPEG, MPGA, WAV, WebM, M4A
Google GeminiYESYESYES (Files API)20 MB inline / 2 GB Files APIWAV, MP3, AIFF, AAC, OGG, FLAC
AnthropicNONONONone
Amazon BedrockVariesNONOModel-dependent
xAI, Mistral, CohereNONONONone

Video

ProviderBase64HTTP URLCloud StorageYouTube URLMax SizeFormats
Google GeminiYES (<100 MB)YES (Files API)YES (GCS)YES100 MB inline / 20 GB Files APIMP4, MOV, WebM, AVI, FLV, MPEG, WMV, 3GPP
Amazon BedrockYESNOYES (S3)NOS3 recommended for large filesMKV, MOV, MP4, WebM, FLV, MPEG, MPG, WMV, 3GP
OpenAI, Anthropic, xAI, Mistral, CohereNONONONONone

Documents (PDF and Office formats)

ProviderBase64HTTP URLFile Upload (file_id)Max SizeFormats
OpenAINO (file_id only)NOYES (Files API)512 MB per filePDF, DOCX, PPTX, XLSX, TXT, CSV
AnthropicYESYESYES (beta)32 MB, 100 pagesPDF, TXT
Google GeminiYESYESYES (Files API)50 MB inline / 2 GB Files APIPDF, TXT, CSV
Amazon BedrockYESNONO (S3 only)4.5 MB defaultPDF, CSV, DOC, DOCX, XLS, XLSX, HTML, TXT, MD
Mistral (OCR API)NO (separate endpoint)NONO50 MB, 1000 pagesPDF (separate OCR API only)
xAI, CohereNONONONone

Per-Provider Details

OpenAI

OpenAI supports the widest range of resource types and delivery methods. The same format works in both Chat Completions and the Responses API.

Key quirk: GIF support is limited to the first frame only. The detail parameter defaults to auto, which selects low for images under 512×512 px and high for larger images.

Anthropic

Anthropic has strict validation rules but supports both base64 and HTTP URLs for images and PDFs natively, without requiring a separate upload step.

Key quirk: The declared media_type must exactly match the actual file bytes. Declaring image/jpeg while sending PNG bytes causes a hard 400 error. Always detect MIME type from file content, not file extension.

Google Gemini

Gemini has the most flexible resource delivery across all types. It is the only provider that natively supports video input and YouTube URLs.

Key quirk: The 20 MB inline limit applies to the total request payload after base64 encoding — which adds ~33% overhead. A 15 MB raw file becomes ~20 MB base64-encoded. Use the Files API for files over ~14 MB raw.

Amazon Bedrock

Bedrock is the most restrictive provider: HTTP URLs are completely unsupported. All media must be base64-encoded inline or referenced via S3 URI.

Key quirk: HTTP URL delivery is not implemented and there is no announced timeline for support. TheRouter automatically proxies HTTP image URLs to base64 before forwarding requests to Bedrock — this is handled transparently.

xAI Grok

Grok supports only images, and only via base64 or HTTP URL. No audio, video, document, or file upload support.

Key quirk: No Files API. Every request must include the full image data inline or via URL — no pre-upload caching.

Mistral (Pixtral)

Vision support is limited to Pixtral models (pixtral-12b, pixtral-large-2411). PDF handling is a separate OCR API product, not integrated into the chat completions endpoint.

Key quirk: RGBA PNG files cause a decode error. Convert to RGB before encoding (remove the alpha channel). This is a common issue when screenshots or image-editing tools output PNG with transparency.

Cohere (Command A)

Vision support was added in command-a-03-2025. Earlier Command R/R+ models do not support image input.

Code Examples

Send an Image via URL (simplest)

# cURL
curl https://api.therouter.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${THEROUTER_API_KEY}" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is in this image?" },
          {
            "type": "image_url",
            "image_url": { "url": "https://example.com/photo.jpg" }
          }
        ]
      }
    ],
    "max_tokens": 512
  }'
# Python (httpx)
import httpx

response = httpx.post(
    "https://api.therouter.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "anthropic/claude-sonnet-4",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "What is in this image?"},
                    {
                        "type": "image_url",
                        "image_url": {"url": "https://example.com/photo.jpg"},
                    },
                ],
            }
        ],
        "max_tokens": 512,
    },
)
print(response.json()["choices"][0]["message"]["content"])

Send an Image via Base64

import base64
import httpx

with open("photo.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

response = httpx.post(
    "https://api.therouter.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "google/gemini-2.5-flash",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe this image in detail."},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{b64}"},
                    },
                ],
            }
        ],
        "max_tokens": 512,
    },
)

Send a PDF to Claude

import base64
import httpx

with open("report.pdf", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

response = httpx.post(
    "https://api.therouter.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "anthropic/claude-sonnet-4",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Summarize the key findings in this report.",
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:application/pdf;base64,{b64}"
                        },
                    },
                ],
            }
        ],
        "max_tokens": 1024,
    },
)

Note: TheRouter normalizes the PDF into the native Anthropic document content block format transparently. You send the same image_url shape as with images; the gateway handles the conversion.

Send Audio to GPT-4o

import base64
import httpx

with open("question.mp3", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

response = httpx.post(
    "https://api.therouter.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "openai/gpt-audio-1.5",
        "modalities": ["text", "audio"],
        "audio": {"voice": "alloy", "format": "mp3"},
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {"data": b64, "format": "mp3"},
                    }
                ],
            }
        ],
    },
)
print(response.json()["choices"][0]["message"]["content"])

Multi-Image Request

import base64
import httpx

def to_b64(path: str) -> str:
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode()

response = httpx.post(
    "https://api.therouter.ai/v1/chat/completions",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "anthropic/claude-opus-4.6",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Compare these two product designs."},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{to_b64('design_v1.png')}"
                        },
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{to_b64('design_v2.png')}"
                        },
                    },
                ],
            }
        ],
        "max_tokens": 1024,
    },
)

TheRouter Resource Handling

TheRouter normalizes resource delivery across providers transparently. You write your request once using the standard image_url content block format, and the gateway handles provider-specific conversions:

Check model capabilities at the /v1/models endpoint — each model includes an architecture.features array with values like vision, audio_input, and pdf.

Best Practices

Which delivery method to use

Size optimization

Cost implications

Common gotchas

Related