MiniMax-M2.5 launches with SOTA coding claims; verify SWE-bench results
MiniMax launched MiniMax-M2.5, a fast, low-cost coding and agentic model, but teams should validate its headline SWE-bench gains with internal tests given recent concerns about benchmark contamination. MiniMax-M2.5 claims state-of-the-art results in coding, agentic tool use, and search—scoring 80.2% on SWE-bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp—while running 37% faster than M2.1 (matching Claude Opus 4.6 speed) and costing about $1/hour at 100 tokens/sec according to its [Hugging Face card](https://huggingface.co/unsloth/MiniMax-M2.5). OpenAI has ceased reporting on SWE-bench Verified after an audit found flawed tests and evidence of benchmark contamination across major models, suggesting reported gains may reflect training exposure rather than general capability; details are summarized here ([Blockchain.News report](https://blockchain.news/news/openai-abandons-swe-bench-verified-contamination-flawed-tests)). If you trial M2.5, note the card’s operational tips (Unsloth quantization and llama.cpp’s --jinja template) to streamline self-hosting and cost control via the same [Hugging Face source](https://huggingface.co/unsloth/MiniMax-M2.5).