terminal
howtonotcode.com
business

Unsloth

Ai Tool
article 2 storys calendar_today First seen: 2026-02-20 update Last seen: 2026-03-03 open_in_new Website

Resources

Links to check for updates: homepage, feed, or git repo.

home Homepage

Stories

Showing 1-2 of 2

Coding Benchmarks Shake-up: Qwen 3.5, MiniMax M2.5, and a SWE-bench Reality Check

Open models like Alibaba’s Qwen 3.5 and MiniMax M2.5 post strong coding-agent results, but OpenAI’s audit of SWE-bench Verified shows contamination and flawed tests that can mislead real-world adoption. Alibaba’s Qwen 3.5 family uses a sparse MoE design (397B total/17B active), ships open weights under Apache 2.0, and shows strong instruction following and competitive coding scores in public benchmarks, with setup guidance and comparisons to frontier models detailed in this deep-dive guide [Qwen 3.5: The Complete Guide](https://techie007.substack.com/p/qwen-35-the-complete-guide-benchmarks). MiniMax’s latest model claims state-of-the-art coding and agentic performance, faster task completion, and ultra-low runtime cost (about $1/hour at 100 tok/s), alongside reported scores on coding and browsing evaluations [MiniMax-M2.5 on Hugging Face](https://huggingface.co/unsloth/MiniMax-M2.5). OpenAI, however, reports that many SWE-bench Verified tasks have broken tests and that major models were trained on benchmark solutions, halting its use of the metric and urging caution in interpreting scores [OpenAI Abandons SWE-bench Verified](https://blockchain.news/news/openai-abandons-swe-bench-verified-contamination-flawed-tests). For quick, low-cost trials of multiple “top models,” a short explainer points to an Alibaba Cloud coding plan bundling popular options [This $3 AI Coding Plan Gives You Every Top Model You Need](https://www.youtube.com/watch?v=Qnz7S-5fzWo&pp=ygUXbmV3IEFJIG1vZGVsIGZvciBjb2RpbmfSBwkJrgoBhyohjO8%3D).

calendar_today 2026-03-03
qwen-35 alibaba alibaba-cloud minimax-m25 openai

Practical LLM efficiency: Magma optimizer, Unsloth on HF Jobs, and NVLink realities

A new wave of efficiency wins—masked optimizers, free small‑model fine‑tuning, and faster GPU interconnects—can cut LLM costs without sacrificing quality. Google proposes masking-based adaptive optimization that outperforms Adam/Muon with negligible overhead and drop‑in simplicity; their Momentum‑aligned gradient masking (Magma) reduced 1B‑scale perplexity versus strong baselines in pretraining experiments, making it a compelling swap for existing pipelines ([paper](https://arxiv.org/abs/2602.15322)). For fast, low‑cost customization, Unsloth + Hugging Face Jobs deliver ~2x faster training and ~60% lower VRAM with free credits for fine‑tuning compact models like LFM2.5‑1.2B, which can be deployed on CPUs/phones; the post walks through submitting HF Jobs and provides a ready SFT script ([guide](https://huggingface.co/blog/unsloth-jobs), [training script](https://huggingface.co/datasets/unsloth/jobs/resolve/main/sft-lfm2.5.py)). At the hardware layer, multi‑GPU throughput is gated by interconnects: within a node, NVLink dwarfs PCIe (A100 ~600 GB/s, H100 ~900 GB/s, Blackwell up to 1.8 TB/s per GPU), so collective ops and DDP settings should match topology to avoid communication bottlenecks ([multi‑GPU overview](https://towardsdatascience.com/how-gpus-communicate/)).

calendar_today 2026-02-20
google hugging-face hugging-face-jobs unsloth nvidia