Open-weight coding agents hit 60%+ SWE-B…

NVIDIA PUB_DATE: 2026.03.19

OPEN-WEIGHT CODING AGENTS HIT 60%+ SWE-BENCH AND GET EASIER TO RUN ON-PREM

Open-weight coding agents leaped forward as NVIDIA’s Nemotron 3 Super tops SWE-Bench and new research streamlines on‑prem and local runs. NVIDIA unveiled Nemot...

Open-weight coding agents leaped forward as NVIDIA’s Nemotron 3 Super tops SWE-Bench and new research streamlines on‑prem and local runs.

NVIDIA unveiled Nemotron 3 Super, a 120B-parameter hybrid MoE model scoring 60.47% on SWE-Bench Verified with open weights, recipes, and a 1M-token context window, plus strong throughput in their own benchmarks Smart Chunks. The pitch targets enterprises that want agentic coding on their own hardware.

On the research side, CodeScout trains a code search agent via RL that uses only a Unix shell and posts competitive repo‑level localization on SWE-Bench; the team open-sourced code and models repo.

Local inference got a boost from a community “LLM in a Flash” experiment that streamed MoE experts from SSD to run Qwen3.5‑397B at 5.5+ tok/s on a 48GB M3 Max MacBook, with code and a write‑up shared (Simon Willison, repo). A hands‑on report shows smaller Qwen3.5 variants already usable in VS Code via LM Studio and Continue, though still behind top cloud IDE copilots InfoWorld.

[ WHY_IT_MATTERS ]

01.

Stronger open weights plus RL-driven repo search shrink the gap with closed coding copilots while keeping code on-prem.

02.

Feasible local MoE inference lowers hardware barriers for large models, expanding deployment options outside cloud APIs.

[ WHAT_TO_TEST ]

terminal
Re-run a subset of SWE-Bench Verified on your codebases: Nemotron 3 Super vs your current model, with and without a CodeScout-style localization step.
terminal
Prototype SSD-streamed MoE inference (flash-moe) on a dev workstation/server and measure throughput, latency spikes, and quality vs 4–8 bit baselines.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Integrate agent runs with existing CI to gate PRs: require green tests plus agent-suggested patches behind feature flags.
02.
Lock down repo access: run models in isolated runners, audit tool use, and capture prompts/diffs for compliance.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Design repos for agents: consistent test scaffolds, richer docstrings, and lightweight code maps to aid localization.
02.
Build an on-prem agent stack early: retrieval, terminal tools, and an evaluation harness around SWE-Bench-like tasks.

arrow_back

PREVIOUS_DATA_LOG

Sashiko brings AI first-pass code reviews to the Linux kernel, stirring debate on accuracy and accountability

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

SWE-CI shifts agent evaluation from one-shot bug fixes to CI-driven maintainability

arrow_forward