MiniMax, an AI startup from China, has released M3 — a model that offers a remarkable combination: frontier-level coding, a one-million-token context window, and native multimodality, all in a single open-weight model. That’s a first.
What M3 Can Do
The benchmark numbers are impressive: 59.0 percent on SWE-Bench Pro, beating both GPT-5.5 and Gemini 3.1 Pro, and approaching Claude Opus 4.7. With an asterisk, though: the tests were run on MiniMax’s own infrastructure with their own agent scaffolding. Independent verification is still pending.
The Architecture
M3 is built on MiniMax Sparse Attention (MSA), a new architecture that cuts per-token compute at one million context to one-twentieth of the previous generation. The result: 9x faster prefill and 15x faster decoding.
This is the real breakthrough. Long context windows are useless if they’re impractically slow. MSA solves that problem — at least according to MiniMax’s own claims.
Open Weights — Almost
M3 has been available since June 1 through MiniMax Code and the API. Open weights and a technical report are expected on Hugging Face and GitHub within ten days — around June 10.
The Bigger Picture
Chinese AI models are catching up fast. After DeepSeek and Qwen 3, M3 is further proof that frontier performance is no longer a monopoly of the big US labs. A startup like MiniMax releasing a model that beats GPT-5.5 on a software engineering benchmark would have been unthinkable a year ago.
Whether the benchmark numbers hold up under independent review remains to be seen. But the ambition alone signals that the field is getting tighter.
Sources: MiniMax Blog, The Decoder, TechTimes