Nemotron Super 120B

nvidianvidia/nemotron-super-120b

NVIDIA's hybrid LatentMoE model (120B total, 12B active). Mamba-2 + Attention + MoE architecture with 1M context. Multi-Token Prediction for fast inference.

上下文长度

最大输出

262K

输入价格

$0.240每百万 Tokens

输出价格

$1.02每百万 Tokens

模态能力

文本→文本

价格明细

类型	费率
输入	$0.240 每百万 Tokens
输出	$1.02 每百万 Tokens

支持参数

temperaturemax_tokenstop_ptoolstool_choiceresponse_formatstop

API 使用示例

cURL

curl https://api.therouter.ai/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer $THE_ROUTER_API_KEY"   -d '{
    "model": "nvidia/nemotron-super-120b",
    "messages": [
      {"role": "user", "content": "Summarize the key points from this input."}
    ]
  }'