NVIDIA PUB_DATE: 2026.03.19

OPEN-WEIGHT CODING AGENTS HIT 60%+ SWE-BENCH AND GET EASIER TO RUN ON-PREM

Open-weight coding agents leaped forward as NVIDIA’s Nemotron 3 Super tops SWE-Bench and new research streamlines on‑prem and local runs. NVIDIA unveiled Nemot...

Open-weight coding agents leaped forward as NVIDIA’s Nemotron 3 Super tops SWE-Bench and new research streamlines on‑prem and local runs.

NVIDIA unveiled Nemotron 3 Super, a 120B-parameter hybrid MoE model scoring 60.47% on SWE-Bench Verified with open weights, recipes, and a 1M-token context window, plus strong throughput in their own benchmarks Smart Chunks. The pitch targets enterprises that want agentic coding on their own hardware.

On the research side, CodeScout trains a code search agent via RL that uses only a Unix shell and posts competitive repo‑level localization on SWE-Bench; the team open-sourced code and models repo.

Local inference got a boost from a community “LLM in a Flash” experiment that streamed MoE experts from SSD to run Qwen3.5‑397B at 5.5+ tok/s on a 48GB M3 Max MacBook, with code and a write‑up shared (Simon Willison, repo). A hands‑on report shows smaller Qwen3.5 variants already usable in VS Code via LM Studio and Continue, though still behind top cloud IDE copilots InfoWorld.

[ WHY_IT_MATTERS ]
01.

Stronger open weights plus RL-driven repo search shrink the gap with closed coding copilots while keeping code on-prem.

02.

Feasible local MoE inference lowers hardware barriers for large models, expanding deployment options outside cloud APIs.

[ WHAT_TO_TEST ]
  • terminal

    Re-run a subset of SWE-Bench Verified on your codebases: Nemotron 3 Super vs your current model, with and without a CodeScout-style localization step.

  • terminal

    Prototype SSD-streamed MoE inference (flash-moe) on a dev workstation/server and measure throughput, latency spikes, and quality vs 4–8 bit baselines.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

  • 01.

    Integrate agent runs with existing CI to gate PRs: require green tests plus agent-suggested patches behind feature flags.

  • 02.

    Lock down repo access: run models in isolated runners, audit tool use, and capture prompts/diffs for compliance.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

  • 01.

    Design repos for agents: consistent test scaffolds, richer docstrings, and lightweight code maps to aid localization.

  • 02.

    Build an on-prem agent stack early: retrieval, terminal tools, and an evaluation harness around SWE-Bench-like tasks.

SUBSCRIBE_FEED
Get the digest delivered. No spam.