terminal
howtonotcode.com
Claude Opus 4.6 logo

Claude Opus 4.6

Ai Tool

Claude Opus 4.6 is a large language model by Anthropic.

article 16 storys calendar_today First seen: 2026-02-09 update Last seen: 2026-03-03 open_in_new Website menu_book Wikipedia

Resources

Links to check for updates: homepage, feed, or git repo.

home Homepage

Stories

Showing 1-16 of 16

Coding Benchmarks Shake-up: Qwen 3.5, MiniMax M2.5, and a SWE-bench Reality Check

Open models like Alibaba’s Qwen 3.5 and MiniMax M2.5 post strong coding-agent results, but OpenAI’s audit of SWE-bench Verified shows contamination and flawed tests that can mislead real-world adoption. Alibaba’s Qwen 3.5 family uses a sparse MoE design (397B total/17B active), ships open weights under Apache 2.0, and shows strong instruction following and competitive coding scores in public benchmarks, with setup guidance and comparisons to frontier models detailed in this deep-dive guide [Qwen 3.5: The Complete Guide](https://techie007.substack.com/p/qwen-35-the-complete-guide-benchmarks). MiniMax’s latest model claims state-of-the-art coding and agentic performance, faster task completion, and ultra-low runtime cost (about $1/hour at 100 tok/s), alongside reported scores on coding and browsing evaluations [MiniMax-M2.5 on Hugging Face](https://huggingface.co/unsloth/MiniMax-M2.5). OpenAI, however, reports that many SWE-bench Verified tasks have broken tests and that major models were trained on benchmark solutions, halting its use of the metric and urging caution in interpreting scores [OpenAI Abandons SWE-bench Verified](https://blockchain.news/news/openai-abandons-swe-bench-verified-contamination-flawed-tests). For quick, low-cost trials of multiple “top models,” a short explainer points to an Alibaba Cloud coding plan bundling popular options [This $3 AI Coding Plan Gives You Every Top Model You Need](https://www.youtube.com/watch?v=Qnz7S-5fzWo&pp=ygUXbmV3IEFJIG1vZGVsIGZvciBjb2RpbmfSBwkJrgoBhyohjO8%3D).

calendar_today 2026-03-03
qwen-35 alibaba alibaba-cloud minimax-m25 openai

Claude Code v2.1.49 hardens long-running agents, adds audit hooks, and moves Max users to Sonnet 4.6 (1M)

Anthropic shipped Claude Code v2.1.49 with major stability and performance fixes for long-running sessions, new enterprise audit controls, and a Max-plan model shift to Sonnet 4.6 with a 1M-token context window. The v2.1.49 release notes highlight concrete fixes for memory growth in WASM parsing and layout engines, background agent interrupt handling (double Ctrl+C/ESC), faster non-interactive startup (-p), plugin scope auto-detection, and a prompt cache regression fix, plus simple mode gains a direct file edit tool and SDKs now expose capability flags like supportsAdaptiveThinking ([release](https://github.com/anthropics/claude-code/releases/tag/v2.1.49)). Enterprise teams get a new ConfigChange hook to log or block config edits mid-session, and Max-plan users should switch to Sonnet 4.6 (1M) as Sonnet 4.5 (1M) is being removed. For context on why these changes matter, Anthropic engineers have emphasized prompt caching as key to cost/latency on long-lived agent workflows ([note](https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-everything)), and leadership is openly reframing developer roles toward reviewing and steering AI-authored code rather than typing it by hand ([Boris Cherny interview](https://www.youtube.com/watch?v=We7BZVKbCVw&t=977s&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)).

calendar_today 2026-02-20
claude-code anthropic claude-sonnet-46 github sonnet-46

Windsurf ships new models, Linux ARM64, and enterprise hooks

Windsurf rolled out new frontier coding models, full Linux ARM64 support, and enterprise-grade Cascade Hooks while community feedback spotlights its transparent crediting versus rivals' opaque limits. Windsurf’s latest updates add Gemini 3.1 Pro, Claude Sonnet 4.6, GLM-5, Minimax M2.5, and GPT-5.3-Codex-Spark with time-limited credit multipliers, plus quality-of-life fixes and features like automatic Plan→Code switching, skills loading from .agents/skills, tracked rules in post_cascade_response, and diff zones auto-closing on commit; importantly, it now provides full Linux ARM64 deb/rpm packages and enterprise cloud config for Cascade Hooks with Devin service key auth, as detailed in the [Windsurf changelog](https://windsurf.com/changelog). A power user’s comparison underscores cost control and predictability: they favored Windsurf’s clear credit model over Cursor/Claude Code’s rate-limit surprises, keeping GitHub Copilot Pro+ for predictable premium requests while continuing to code primarily in Windsurf, per this [Reddit write-up](https://www.reddit.com/r/windsurf/comments/1r9b58e/i_almost_left_windsurf/).

calendar_today 2026-02-20
windsurf gemini-31-pro claude-sonnet-46 glm-5 minimax-m25

Delegation vs. coordination: Codex 5.3 or Opus 4.6 for your engineering workflows

OpenAI’s Codex 5.3 favors long-running autonomous delegation while Anthropic’s Opus 4.6 favors coordinated, tool-integrated agent teams, and picking one early will shape your workflows and switching costs. In this analysis of two same-day releases, Codex 5.3 is framed as an agent you hand a task to and walk away from for hours, whereas Opus 4.6 is positioned to plug into your existing tools, orchestrate agent teams, and extend beyond code into broader knowledge work ([read the comparison](https://natesnewsletter.substack.com/p/codex-53-vs-opus-46-two-agent-philosophies)). The piece contrasts a “correctness architecture” for Codex—aimed at producing work you can trust without reviewing every line—against Claude’s integration-first approach with a protocol layer and agent teams. For engineering leaders, the key moves are a workflow audit (which tasks benefit from autonomy vs. coordination), explicit correctness gates, and an understanding that this choice compounds—affecting org structure, toolchains, and the difficulty of switching later ([full brief](https://natesnewsletter.substack.com/p/codex-53-vs-opus-46-two-agent-philosophies)).

calendar_today 2026-02-17
openai anthropic codex-53 claude-opus-46 claude

GLM-5 and MiniMax M2.5 push low-cost, agentic coding into production range

Two Chinese releases—Zhipu AI’s GLM-5 and MiniMax M2.5—signal a shift toward affordable, agentic coding models that challenge frontier systems on practical benchmarks. Zhipu AI’s GLM-5 is positioned as an MIT-licensed open model with a native Agent Mode that rivals proprietary leaders on multiple benchmarks, with a deep-dive detailing its pre-launch appearance under a pseudonym and hints from vLLM pull requests ([official overview](https://z.ai/blog/glm-5?_bhlid=d84a093754c9e11cb0d2e9ff416fd99cb5f0e2da), [leak analysis](https://medium.com/reading-sh/glm-5-chinas-745b-parameter-open-source-model-that-leaked-before-it-launched-b2cfbafe99ef?source=rss-8af100df272------2), [weights claim](https://medium.com/ai-software-engineer/glm-5-arrive-with-a-bang-from-vibe-coding-to-agentic-engineering-disrupts-opus-b2b13f02b819)). MiniMax’s M2.5 posts strong results on coding and agentic tasks—80.2% SWE-Bench Verified, 51.3% Multi-SWE-Bench, 76.3% BrowseComp—while running 37% faster than M2.1 and costing roughly $1/hour at 100 tokens/sec (or $0.30/hour at 50 tps), with speed reportedly matching Claude Opus 4.6 ([release details](https://www.minimax.io/news/minimax-m25)). For developer workflows, quick-start videos show GLM-5 (and similarly Kimi K2.5) slotting into Claude Code with minimal setup, lowering trial friction inside existing IDEs ([GLM-5 with Claude Code](https://www.youtube.com/watch?v=Ey-HW-nJBiw&pp=ygURQ3Vyc29yIElERSB1cGRhdGU%3D), [Kimi K2.5 with Claude Code](https://www.youtube.com/watch?v=yZtLwOhmHps&pp=ygURQ3Vyc29yIElERSB1cGRhdGU%3D)).

calendar_today 2026-02-12
zhipu-ai glm-5 minimax minimax-m25 openrouter

Claude Code’s agentic push meets release governance

Claude Code is moving from autocomplete to autonomous delivery, and new updates plus governance patterns show how to adopt it safely across backends and data pipelines. Anthropic shipped multiple February hardening updates to Claude Code (2.1.39–2.1.42) that add a guard against nested sessions, clearer Bedrock/Vertex/Foundry fallbacks, CLI auth, Windows ARM64 support, and richer OpenTelemetry spans via a new speed attribute ([release notes](https://releasebot.io/updates/anthropic/claude-code)). As agentic coding scales beyond snippets to plans, tests, and commits, [Unleash’s guide](https://www.getunleash.io/blog/claude-code-unleash-agentic-ai-release-governance) lays out a FeatureOps playbook (standard flag naming, mandatory gating, and cleanup) tailored to Claude Code’s terminal + MCP architecture. For rollout, pilot Agent Teams on a low-risk service and wire it into CI under flags using this 13‑minute walkthrough ([video](https://www.youtube.com/watch?v=y9IYtWELMHw&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)), scaffold workflows with the community’s [ultimate guide](https://github.com/FlorianBruniaux/claude-code-ultimate-guide), and use this Opus 4.6 technical dive to inform capability boundaries and prompt patterns ([deep dive](https://medium.com/@comeback01/the-arrival-of-claude-opus-4-6-a-technical-deep-dive-into-the-enterprise-ai-singularity-0f86002836c1)).

calendar_today 2026-02-12
anthropic claude-code unleash claude-opus-46 bedrock

Claude Opus 4.6 adds agent teams, 1M context, and fast mode; GPT-5.3-Codex counters

Anthropic’s Claude Opus 4.6 ships multi-agent coding, a 1M-token context window, and a 2.5x fast mode, while OpenAI’s GPT-5.3-Codex brings faster agentic coding with strong benchmark results. DeepLearning.ai details Opus 4.6’s long-context, agentic coding gains, new API controls, and Codex 5.3’s speed and scores, plus pricing context [Data Points: Claude Opus 4.6 pushes the envelope](https://www.deeplearning.ai/the-batch/claude-opus-4-6-pushes-the-envelope/)[^1]. AI Collective highlights Claude Code’s new multi-agent “agent teams,” Office sidebars, and head-to-head benchmark moves versus OpenAI, while Storyboard18 confirms a 2.5x “fast mode” rollout for urgent work [Anthropic’s Opus 4.6 Agent Teams & OpenAI’s Codex 5.3](https://aicollective.substack.com/p/the-brief-anthropics-opus-46-agent)[^2] and [Anthropic rolls out fast mode for Claude Code](https://www.storyboard18.com/digital/anthropic-rolls-out-fast-mode-for-claude-code-to-speed-up-developer-workflows-89148.htm)[^3]. [^1]: Roundup covering features, benchmarks, and pricing for Opus 4.6 and GPT‑5.3‑Codex. [^2]: Newsletter with details on "agent teams," 1M-context performance, Office integrations, and comparative benchmarks. [^3]: Report on the 2.5x faster "fast mode" availability and target use cases.

calendar_today 2026-02-09
anthropic claude-opus-46 claude-code openai gpt-53-codex

Codex 5.3 vs Opus 4.6: agentic speed vs long‑context depth

OpenAI's GPT-5.3 Codex and Anthropic's Claude Opus 4.6 arrive with distinct strengths—Codex favors faster agentic execution while Opus excels at long-context reasoning and consistency—so choose based on workflow fit, not hype. Independent hands-on comparisons report Codex 5.3 is snappier and stronger at end-to-end coding actions, while Opus 4.6 is more reliable with context and less babysitting for routine repo tasks, with benchmark numbers and capabilities outlining the trade-offs in real projects ([Interconnects](https://www.interconnects.ai/p/opus-46-vs-codex-53)[^1], [Tensorlake](https://www.tensorlake.ai/blog/claude-opus-4-6-vs-gpt-5-3-codex)[^2]). Opus adds agent teams, 1M-token context (beta), adaptive effort controls, and Codex claims ~25% speed gains and agentic improvements, underscoring a shift toward practical, multi-step workflows ([Elephas](https://elephas.app/resources/claude-opus-4-6-vs-gpt-5-3-codex)[^3]). [^1]: Adds: Usability differences from field use; Opus needs less supervision on mundane tasks while Codex 5.3 improved but can misplace/skip files. [^2]: Adds: Concrete benchmarks (SWE Bench Pro, Terminal Bench 2.0, OSWorld) and scenario-based comparison for UI/data workflows. [^3]: Adds: Feature deltas (Agent Teams, 1M context, adaptive thinking) and speed claims/timing details across both launches.

calendar_today 2026-02-09
openai anthropic gpt-53-codex claude-opus-46 claude-code

Opus 4.6 Agent Teams vs GPT-5.3 Codex: multi‑agent coding arrives for real SDLC work

Anthropic's Claude Opus 4.6 brings multi-agent "Agent Teams" and a 1M-token context while OpenAI's GPT-5.3-Codex counters with faster, stronger agentic coding, together signaling a step change in AI-assisted development. Opus 4.6 adds team-based parallelization in Claude Code, long‑context retrieval gains, adaptive reasoning/effort controls, and Office sidebars, with pricing unchanged [Data Points](https://www.deeplearning.ai/the-batch/claude-opus-4-6-pushes-the-envelope/)[^1] and launch coverage framing initial benchmark leads at release [AI Collective](https://aicollective.substack.com/p/the-brief-anthropics-opus-46-agent)[^2]. OpenAI’s GPT‑5.3‑Codex posts top results on SWE‑Bench Pro and Terminal‑Bench 2.0 and helped debug its own training pipeline [Data Points](https://www.deeplearning.ai/the-batch/claude-opus-4-6-pushes-the-envelope/)[^3], while practitioners surface Claude Code’s new Auto‑Memory behavior/controls for safer long‑running projects [Reddit](https://www.reddit.com/r/ClaudeCode/comments/1qzmofn/how_claude_code_automemory_works_official_feature/)[^4] and Anthropic leaders say AI now writes nearly all their internal code [India Today](https://www.indiatoday.in/technology/news/story/anthropic-says-ai-writing-nearly-100-percent-code-internally-claude-basically-writes-itself-now-2865644-2026-02-09)[^5]. [^1]: Adds: Opus 4.6 features (1M context), long‑context results, adaptive/effort/compaction API controls, and unchanged pricing. [^2]: Adds: Agent Teams in Claude Code, Office (Excel/PowerPoint) sidebars, 1M context, and benchmark framing at launch. [^3]: Adds: GPT‑5.3‑Codex benchmarks, 25% speedup, availability, and self‑use in OAI’s training/deployment pipeline. [^4]: Adds: Concrete Auto‑Memory details (location, 200‑line cap) and disable flag for policy compliance. [^5]: Adds: Real‑world claim of near‑100% AI‑written internal code at Anthropic, indicating mature SDLC use.

calendar_today 2026-02-09
anthropic openai claude-opus-46 claude-code gpt-53-codex

Copilot CLI 0.0.406 adds MCP upgrades and Claude preview; community proxy unlocks Copilot in Cursor

GitHub Copilot CLI 0.0.406 brings MCP-focused UX improvements, a Claude Opus 4.6 Fast preview, and safer flags, while a community proxy shows how to use a Copilot subscription inside Cursor’s Agent features. Per the official notes, v0.0.406 adds Claude Opus 4.6 Fast support, command-to-skill translation, /changelog, MCP status, structured responses for VS Code, URL-based plugin marketplace, and a --no-experimental flag [GitHub Copilot CLI releases](https://github.com/github/copilot-cli/releases)[^1]. A community guide details a "Copilot Proxy for Cursor" that routes Cursor to your Copilot key with MCP/tool support and vision handling; use cautiously given it relies on internal APIs [DEV: Unlock GitHub Copilot in Cursor](https://dev.to/jacksonkasi/unlock-github-copilot-in-cursor-the-ultimate-guide-free-unlimited-4i9c)[^2]. [^1]: Adds: Official 0.0.406 features, MCP/skills changes, and safety flags. [^2]: Adds: How the proxy works, setup steps, supported models/tools, and caveats.

calendar_today 2026-02-07
github-copilot github copilot-cli cursor anthropic

Hands-on: Claude Opus 4.6 nails non‑agentic coding; GPT‑5.3 Codex lacks API

A 48-hour hands-on found Claude Opus 4.6 delivering perfect non-agentic coding results while GPT‑5.3 Codex looks strong in benchmarks but still lacks API access for validation. In this test-run, Opus 4.6 hit 100% across 11 single-shot coding tasks (including 3D layout, SVG composition, and legal-move chess) and contradicted popular benchmark narratives, while Codex couldn’t be reproduced due to no API access yet per this report [I Spent 48 Hours Testing Claude Opus 4.6 & GPT-5.3 Codex](https://medium.com/@info.booststash/i-spent-48-hours-testing-claude-opus-4-6-gpt-5-3-codex-004adc046312)[^1]. [^1]: Adds: hands-on results, examples, benchmark context, and note on GPT‑5.3 Codex API unavailability.

calendar_today 2026-02-07
claude-opus-46 gpt-53-codex anthropic openai terminal-bench

Claude Code Opus 4.6 adds Fast mode and native Agent Teams

Claude Code now ships Fast mode for Opus 4.6 and native Agent Teams, plus a hotfix that makes /fast immediately available after enabling extra usage. Release notes confirm Fast mode for Opus 4.6 and the /fast availability fix, with setup docs for toggling and usage [here](https://github.com/anthropics/claude-code/releases)[^1] and [here](https://code.claude.com/docs/en/fast-mode)[^2]. Walkthroughs show how to stand up Agent Teams and add lightweight persistent memory so the agent keeps project context across sessions [here](https://www.youtube.com/watch?v=QXqnZsPLix8&pp=ygUSQ2xhdWRlIENvZGUgdXBkYXRl0gcJCZEKAYcqIYzv)[^3] and [here](https://www.youtube.com/watch?v=ryqpGVWRQxA&pp=ygUSQ2xhdWRlIENvZGUgdXBkYXRl)[^4]. [^1]: Adds: official v2.1.36/37 release notes (Fast mode enabled for Opus 4.6; /fast availability fix) and prior sandbox bug fix. [^2]: Adds: official Fast mode documentation and guidance. [^3]: Adds: hands-on demo and setup steps for native Agent Teams in Claude Code V3. [^4]: Adds: tutorial to implement persistent memory so Claude retains codebase context.

calendar_today 2026-02-07
anthropic claude-code claude-opus-46 fast-mode agent-teams

User flags degraded Claude Opus 4.6 behavior and higher credit burn in Windsurf vs Claude Code

A Reddit report describes noticeably worse results and more credit burn when using Claude Opus 4.6 through Windsurf compared to running the same model via Claude Code directly. The post details unnecessary back-and-forth, confrontational replies, and 2×–4× credit multipliers in [this thread](https://www.reddit.com/r/windsurf/comments/1qxpcfd/is_anyone_else_getting_really_frustrated_with/)[^1]. [^1]: Adds: First-hand comparison of Windsurf vs Claude Code behavior, including examples and credit multipliers.

calendar_today 2026-02-07
windsurf claude-opus-46 claude-code claude-opus ai-coding-assistants