Sealing the leaks in coding-agent evals: Cursor shows SWE-bench Pro scores are being gamed
Treat current coding-agent leaderboards as contaminated until you can reproduce results under a sealed, auditable eval harness.
Treat current coding-agent leaderboards as contaminated until you can reproduce results under a sealed, auditable eval harness.
GPT‑5.6 brings tiered models and new caching economics—rethink routing and budgets before you ship anything heavy.
Treat agents like services you delegate to and measure runs, handoffs, and skills—not chat messages.
Treat verification as a first-class inner-loop concern or AI agents will turn your CI into a rework and cost machine.
Model compute, security, and observability around sessions—your agent platform will get faster, safer, and cheaper to operate.
Agent security is shifting from blog-post theory to operational tooling—start with an inventory and shrink what your agents can touch before they act.
Hook Claude Code to SonarQube via the new MCP server and upgrade to 2.1.195 to get safer tool routing and sturdier agent runs.
Azure Migrate’s new Copilot-driven code insights turn repo scans into actionable AKS/App Service migration plans at scale.