MiniMax M2.7 Review: Is It Worth the Hype?
MiniMax M2.7 landed on March 18, 2026, and the pitch is bold: frontier-adjacent coding performance at a fraction of the cost. In a head-to-head test by Kilo Code, M2.7 delivered roughly 90% of Claude Opus 4.6's quality for about 7% of the total task cost. That's the kind of number that makes you stop scrolling. But benchmarks are benchmarks, and real work is real work — so I dug into the details, the pricing, the speed issues, and the OpenCode integration to figure out whether this MiniMax M2.7 review ends with "switch now" or "wait and see."
Spoiler: it's neither. It's "add it to your toolbox."
What Is MiniMax M2.7?
M2.7 is a reasoning model from Shanghai-based MiniMax, built on a Sparse Mixture-of-Experts architecture. It has roughly 230 billion total parameters but only activates around 10 billion per token — which is how they keep inference costs so low while still hitting competitive benchmarks. MiniMax calls it "the smallest model in the Tier-1 performance class," and the numbers at least make the case plausible.
The context window is 204,800 tokens (input plus output combined), with a max output of 131,072 tokens. That puts it above GPT-5.2's 128K but well below Gemini 3.1 Pro's 2M window. It's text-only — no images, no audio, no video natively — though it supports tool calling and MCP, so you can bolt on image understanding and web search through external tools.
One thing worth noting: M2.7 is proprietary. This is a sharp turn from MiniMax's earlier M2 and M2.5 (which I reviewed previously), which were fully open-weight under Apache 2.0. The community has not been thrilled about this. If you valued the open approach, this stings.
I should be honest: I really liked M2.5. I said so in the review. But after a few weeks of daily use, I found myself drifting back to Claude for bigger tasks, and then spending a lot of time testing GPT 5.4. That's not a knock on M2.5 — it's just the reality of how model usage works now. You explore, you compare, you settle into patterns. M2.7 is MiniMax's attempt to pull you back.
The marketing headline is "self-evolution" — M2.7 participated in its own training loop, running 100+ optimization cycles on its own scaffold and reportedly improving internal benchmarks by 30%. That sounds dramatic, but MindStudio's caution is worth internalising: internal benchmark gains don't automatically translate to neutral third-party evaluations. It's an interesting research direction, not a finished revolution.
Benchmarks — Close to Frontier, Not Quite There
Coding Benchmarks
The headline numbers are genuinely impressive. M2.7 scores 56.22% on SWE-Pro, which nearly matches Claude Opus 4.6 at roughly 57%. On SWE-Bench Verified it hits 78%, which actually outperforms Opus's 55% on that particular benchmark. It also scored 55.6% on VIBE-Pro and 57.0% on Terminal Bench 2.
But benchmarks have become increasingly hard to compare meaningfully. Different scaffolds, different evaluation conditions, different levels of self-reporting. The most useful data point I found was Kilo Code's head-to-head comparison across three TypeScript codebases. Both models found all six planted bugs and all ten security vulnerabilities. M2.7 even offered a technically superior fix for one bug — using integer math for currency calculations instead of floats, which is the kind of thing that makes you nod approvingly.
Where Opus pulled ahead was in the details that matter for maintainability: it produced 41 integration tests versus M2.7's 20 unit tests, used a more modular file structure, and demonstrated better architectural thinking overall. The total cost difference? $0.27 for M2.7 versus $3.67 for Opus. That's the kind of gap that changes how you think about task routing.
My take: for the stuff I do day-to-day — quick bug fixes, code reviews, feature scaffolding — M2.7 is more than capable. For complex architectural work where I need the model to think deeply about structure and test coverage, I'm still reaching for something stronger.
General Intelligence
On Artificial Analysis's Intelligence Index v4.0, M2.7 scores 50 — a solid 8-point jump from M2.5's 42, but still behind Gemini 3.1 Pro and GPT-5.4 (both at 57), Opus 4.6 (53), and Sonnet 4.6 (52). Not frontier, but firmly in the "good enough for most tasks" tier.
The hallucination rate is interesting: 34% per Artificial Analysis, which is actually lower than Claude Sonnet 4.6 (46%) and Gemini 3.1 Pro Preview (50%). Take that with a grain of salt — hallucination metrics are notoriously slippery — but it's worth noting.
A few caveats. Most of MiniMax's benchmark claims come from self-evaluation, and independent verification is still catching up. VentureBeat noted that on BridgeBench vibe-coding tasks, M2.7 actually scored worse than its predecessor M2.5. That's the kind of regression that benchmark cherry-picking can hide.
Token Plans and Pricing Breakdown
Pay-As-You-Go
Here's where M2.7 gets genuinely compelling. The API pricing is $0.30 per million input tokens and $1.20 per million output tokens. Compare that to Opus 4.6 at $5/$25 — that's roughly 17× cheaper on input and 21× cheaper on output.
You'll see some articles claiming "50× cheaper on input" — that's comparing against the old Opus 4.1 pricing of $15/M, not the current Opus 4.6 at $5/M. Still a massive gap, just not quite as dramatic as some headlines suggest.
The automatic caching is a genuine standout feature. Cache reads cost just $0.06 per million tokens with zero configuration required. For cache-heavy agentic workloads where you're hitting the same system prompts and context repeatedly, your blended cost can drop to roughly $0.06/M tokens. That's essentially free compared to what you're used to paying.
But — and this is important — there's a verbosity tax. During Artificial Analysis's evaluation, M2.7 generated 87 million output tokens. The median for reasoning models in its price tier is 20 million. That's 4× more output tokens than average, which significantly erodes the headline per-token savings. Multiple Reddit users have reported 16,000+ tokens of thinking for simple prompts. The evaluation alone cost $175 in tokens. So yes, the per-token price is absurdly low, but the model is also absurdly chatty.
Subscription Tiers
MiniMax offers six monthly token plans ranging from $10 to $150, bundling M2.7 with their speech, image, video, and music models:
The Starter tier at $10/month gives you 1,500 requests per 5-hour rolling window on the standard variant — M2.7 only, no extras. Plus ($20/month) bumps that to 4,500 requests and adds speech and image generation. Max ($50/month) gives you 15,000 requests with the full suite including video and music.
Then there are the highspeed variants: Plus-Highspeed ($40/month), Max-Highspeed ($80/month), and Ultra-Highspeed ($150/month) at 30,000 requests per 5-hour window with everything included.
Yearly plans save about 17% (Starter drops to $100/year, Ultra-Highspeed to $1,500/year).
Which tier makes sense depends entirely on your usage pattern. If you're doing light coding work — a few tasks a day, nothing too intensive — the Starter plan or even pay-as-you-go might be more economical. For heavy agentic workloads where you're burning through tokens continuously, the Max or Ultra-Highspeed tiers start to look like genuine bargains. The Starter tier's break-even against pay-as-you-go isn't always favourable for casual use, so do the maths for your specific workflow before committing.
The Speed Problem
This is where the story gets less rosy.
MiniMax claims roughly 100 tokens per second for the highspeed variant and around 60 TPS for standard. They market M2.7 as "3× faster than Opus." Independent testing tells a different story. Artificial Analysis measured the standard variant at 45.6 tokens per second — against a median of 95.8 TPS for reasoning models in its price tier. Time to first token was 2.49 seconds versus a 1.84-second median. That's noticeably sluggish in interactive use.
It's possible the highspeed variant performs closer to claims, but Artificial Analysis appears to have tested the standard endpoint, and I haven't found independent benchmarks of the highspeed tier yet. Until someone verifies those numbers, I'd treat the 100 TPS claim with healthy scepticism.
The underlying issue is architectural. M2.7 uses full attention across its context window, so performance degrades further on long-context workloads. Community members in the llama.cpp project have flagged this: "Minimax applied full attention, thus it's so slow in long ctx." The 205K context window looks competitive on paper, but pushing anywhere near that limit will test your patience.
I actually wish speed got more attention across the industry generally. A model that's 10% smarter but 3× slower often feels worse in practice. The highspeed plan is compelling in theory — if it actually delivers the throughput MiniMax claims. That's a big "if" right now. I haven't spent the money on the highspeed subscription yet — I have too many subscriptions as it is — but it's on my list. Maybe next month, when I carve out time for proper testing.
Using M2.7 with OpenCode
If you're already using OpenCode (and I switched from Claude Code to OpenCode recently), M2.7 slots in naturally. MiniMax is a preloaded built-in provider — you run opencode auth login, select MiniMax, enter your API key, and you're off. The model uses the Anthropic-compatible API format at https://api.minimax.io/anthropic/v1.
One gotcha: clear any existing ANTHROPIC_AUTH_TOKEN and ANTHROPIC_BASE_URL environment variables first. If you've been using Claude through OpenCode, those will conflict. I've seen people lose 20 minutes to this before checking the obvious.
There are five access paths total: direct API integration, OpenCode Go ($5 first month, then $10/month — includes M2.7, M2.5, GLM-5, and Kimi K2.5), OpenCode Zen for pay-as-you-go, OpenRouter (minimax/minimax-m2.7), and Ollama Cloud. MiniMax provides an official setup guide that's actually decent.
For best results, the recommended inference parameters are temperature=1.0, top_p=0.95, top_k=40. These are higher than what you might instinctively set, but they seem to produce better output quality with this particular model.
Known issues worth watching: earlier MiniMax models in OpenCode had problems with tool-calling loops and premature task halting. These appear to be improving with M2.7 but haven't fully disappeared. If you hit a case where the model gets stuck in a loop or stops mid-task, it's a known pattern, not something unique to your setup.
Should You Switch? My Honest Take
I'll be honest — I wasn't sure I should write another model review. I'm starting to worry I'll repeat myself. New model drops, benchmarks look great, pricing is aggressive, some caveats apply. You've read that article before. I've written that article before.
And that's kind of the point. New models are impressive, but they're not exciting in the way they used to be. The performance gap between "best" and "good enough" is shrinking every quarter, and the real innovation is happening elsewhere — in the desktop apps, the chat interfaces, the coding CLIs. The features being added to the tools we use daily feel more consequential right now than another few percentage points on SWE-Bench.
That said, M2.7 is still worth your attention for one simple reason: the pricing. Even if you never make it your primary model, having a plan B that costs this little is genuinely useful. Quick tasks, simple fixes, boilerplate generation — there's no reason to burn Opus tokens on that stuff. I don't think "switching" is the right frame anymore. I run the same task on two or three models and pick the best result. It takes an extra minute and saves me from the weaknesses of any single model.
For anything requiring deep architectural thinking, thorough test coverage, or complex multi-file reasoning, I still reach for Opus or Sonnet. The quality gap is real but narrow, and for 80% of daily coding work, that gap doesn't matter.
My recommendation: just try it. The $10 Starter plan or pay-as-you-go makes experimentation essentially risk-free. Run your typical tasks through it for a week. You'll know pretty quickly whether it fits your workflow.
And MiniMax isn't the only one making moves in this space. Xiaomi dropped another surprisingly strong model recently, and Alibaba Cloud has a code subscription that's turning heads. The cost-effective tier of AI coding is getting crowded fast, which is great for the OpenClaw/agent use cases where token costs compound. There's a lot to test, and honestly, we shouldn't complain about having too many good options. That's a problem I'm happy to have.
The market is moving toward "best model per dollar," not just "best model." MiniMax is betting on that future, and right now, the bet looks increasingly sound.
Related Articles
MiniMax M2.5 Review: Why I'm Seriously Considering Ditching Claude
MiniMax M2.5 paired with OpenCode CLI delivers frontier coding performance at 5-20% of Claude's price. Here's my hands-on experience and why it matters.
10 min read
AI NewsI Tested GPT 5.4 Against Every Rival — Here's My Honest Review
I tested GPT 5.4 head-to-head against Claude, Gemini, and MiniMax on a real coding task. Here's what the benchmarks don't tell you.
10 min read
AI NewsClaude Opus 4.6: What's Actually Better?
Claude Opus 4.6 dominates benchmarks and coding tasks, but is it really better than 4.5? A developer's honest take on what changed and what matters.
7 min read