Karpathy’s 2025 LLM themes: RLVR, jagged intelligence, and vibe coding

GENERAL PUB_DATE: 2026.W01

Two third-party breakdowns of Karpathy’s 2025 review highlight a shift toward reinforcement learning from verifiable rewards (tests, compilers), acceptance of "...

Two third-party breakdowns of Karpathy’s 2025 review highlight a shift toward reinforcement learning from verifiable rewards (tests, compilers), acceptance of "jagged" capability profiles, and "vibe coding"—agentic, tool-using code workflows integrated with IDE/CI. For backend/data teams, this points to focusing AI assistance on tasks with objective checks (unit tests, schema/contracts) and wiring agents to real tools (repos, runners, linters) rather than relying on prompts alone.

[ WHY_IT_MATTERS ]

01.

Constrain LLM work to tasks with objective pass/fail signals (tests, type checks, SQL validators) to get reliable wins.

02.

Uneven model strengths require routing, fallback models, and human-in-the-loop on hard edges.

[ WHAT_TO_TEST ]

terminal
Create evals where LLM-generated Python/SQL must pass unit tests, linters, and migration checks; track pass@k, fix rate, and time-to-green in CI.
terminal
Prototype an IDE/CI agent that can run tools (pytest, mypy, sqlfluff, docker) and compare against prompt-only baselines for accuracy and latency.

arrow_back

PREVIOUS_DATA_LOG

Default-on Copilot backlash: enforce policy-based, opt‑in rollouts

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

Founder claims AI tools replaced devs—practical takeaways for teams

arrow_forward