Reasoning

Control model reasoning depth and monitor token costs in complex tasks.

POST/v1/responses

Name	Type	Description
reasoning.effort	string	low \| medium \| high
reasoning.summary	string	none \| auto for reasoning summaries
max_output_tokens	integer	Cap for response size after reasoning

Reasoning Request

JSON

{
  "model": "openai/o3-mini",
  "input": "Plan a zero-downtime migration strategy for a payments database.",
  "reasoning": {
    "effort": "high",
    "summary": "auto"
  },
  "max_output_tokens": 600
}

reasoning-usage.json

{
  "usage": {
    "input_tokens": 35,
    "output_tokens": 412,
    "reasoning_tokens": 228,
    "total_tokens": 675
  }
}

reasoning.summary Provider Support

The reasoning.summary parameter is accepted in requests but may not be passed through to all upstream providers. If the target model or provider does not support reasoning summaries, the parameter is silently ignored.

Reasoning Cost Tradeoff

Higher reasoning effort improves reliability for difficult tasks but increases latency and token usage. Start at medium and scale only when needed.