MiniMax M2.5 High Speed

MiniMax M2.5 High Speed is the throughput-optimized variant that retains M2.5's full planning and software engineering capabilities.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'minimax/minimax-m2.5-highspeed',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out MiniMax M2.5 High Speed by MiniMax. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

MiniMax M2.5 High Speed

Ask MiniMax M2.5 High Speed anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

MiniMax

205K

1.0s

42tps

$0.60/M

$2.40/M

Read:$0.03/M

Write:$0.38/M

—

02/12/2026

Novita AI

205K

1.6s

60tps

$0.60/M

$2.40/M

Read:$0.03/M

Write:—

—

02/12/2026

More models by MiniMax

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

minimax/minimax-m3

1.3s

159tps

$0.60/M$0.30/M

$2.40/M$1.20/M

Read:

$0.12/M$0.06/M

Write:

—

05/31/2026

minimax/minimax-m2.7

205K

0.3s

205tps

$0.15/M

$0.60/M

Read:$0.06/M

Write:$0.38/M

—

03/18/2026

minimax/minimax-m2.7-highspeed

205K

0.9s

44tps

$0.60/M

$2.40/M

Read:$0.06/M

Write:$0.38/M

—

03/18/2026

minimax/minimax-m2.5

0.4s

300tps

$0.07/M

$0.57/M

Read:$0.03/M

Write:$0.38/M

—

02/12/2026

minimax/minimax-m2.1

205K

0.4s

293tps

$0.30/M

$1.20/M

Read:$0.03/M

Write:$0.38/M

—

12/23/2025

minimax/minimax-m2

205K

0.7s

73tps

$0.30/M

$1.20/M

Read:$0.03/M

Write:$0.38/M

—

10/27/2025

About MiniMax M2.5 High Speed

MiniMax M2.5 High Speed targets autonomous coding agents that run for extended periods and need fast token generation. See live metrics on this page for current throughput. For cost estimates, use and your expected token volumes rather than a fixed hourly figure.

The "highspeed" label doesn't indicate a distilled or reduced-capability model. MiniMax M2.5 High Speed retains the full architectural planning mode of standard M2.5. It decomposes problems into specifications before writing code, handles the complete development lifecycle across Web, Android, iOS, Windows, and Mac platforms, and matches the same reported SWE-Bench Verified score as standard M2.5.

The tradeoff is straightforward: you pay roughly twice as much per token in exchange for generating tokens roughly twice as fast on paper. For batch jobs where wall-clock time doesn't matter, standard M2.5 is more economical. For interactive sessions, streaming UIs, and agent loops where latency compounds, the highspeed variant can win.

What To Consider When Choosing a Provider

Configuration: The highspeed variant lists at roughly double the standard M2.5 input and output rates on many providers. AI Gateway's per-request cost tracking lets you measure whether the throughput gain pays for itself in your workload before you commit at scale.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiniMax M2.5 High Speed

Best For

Long-running coding agents: Autonomous sessions that span multi-minute or multi-hour workflows
Streaming pair-programming: Chat interfaces where token delivery speed is user-visible
High-concurrency services: Per-request latency accumulates across simultaneous users
Wall-clock time priority: Halving inference time justifies a 2x per-token cost increase

Consider Alternatives When

Fast tasks anyway: Your tasks complete in seconds regardless of throughput, so the speed premium has no practical impact
Cost over latency: Per-token cost is a harder constraint than latency and standard M2.5 delivers the same output for less
Multi-agent coordination: You need the features introduced in M2.7

Conclusion

MiniMax M2.5 High Speed occupies a clear niche: same model, faster output, higher price. It's the right pick when your agent or application is bottlenecked on token generation speed and the cost difference is justified by reduced wall-clock time or improved user experience.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

MiniMax M2.5 High Speed

Playground

Providers

More models by MiniMax

About MiniMax M2.5 High Speed

What To Consider When Choosing a Provider

When to Use MiniMax M2.5 High Speed

Best For

Consider Alternatives When

Conclusion