terminal
howtonotcode.com
Claude Opus logo

Claude Opus

Term

A term related to Anthropic's Claude AI model and its applications.

article 16 storys calendar_today First seen: 2025-12-30 update Last seen: 2026-03-03 open_in_new Website menu_book Wikipedia

Stories

Showing 1-16 of 16

Coding Benchmarks Shake-up: Qwen 3.5, MiniMax M2.5, and a SWE-bench Reality Check

Open models like Alibaba’s Qwen 3.5 and MiniMax M2.5 post strong coding-agent results, but OpenAI’s audit of SWE-bench Verified shows contamination and flawed tests that can mislead real-world adoption. Alibaba’s Qwen 3.5 family uses a sparse MoE design (397B total/17B active), ships open weights under Apache 2.0, and shows strong instruction following and competitive coding scores in public benchmarks, with setup guidance and comparisons to frontier models detailed in this deep-dive guide [Qwen 3.5: The Complete Guide](https://techie007.substack.com/p/qwen-35-the-complete-guide-benchmarks). MiniMax’s latest model claims state-of-the-art coding and agentic performance, faster task completion, and ultra-low runtime cost (about $1/hour at 100 tok/s), alongside reported scores on coding and browsing evaluations [MiniMax-M2.5 on Hugging Face](https://huggingface.co/unsloth/MiniMax-M2.5). OpenAI, however, reports that many SWE-bench Verified tasks have broken tests and that major models were trained on benchmark solutions, halting its use of the metric and urging caution in interpreting scores [OpenAI Abandons SWE-bench Verified](https://blockchain.news/news/openai-abandons-swe-bench-verified-contamination-flawed-tests). For quick, low-cost trials of multiple “top models,” a short explainer points to an Alibaba Cloud coding plan bundling popular options [This $3 AI Coding Plan Gives You Every Top Model You Need](https://www.youtube.com/watch?v=Qnz7S-5fzWo&pp=ygUXbmV3IEFJIG1vZGVsIGZvciBjb2RpbmfSBwkJrgoBhyohjO8%3D).

calendar_today 2026-03-03
qwen-35 alibaba alibaba-cloud minimax-m25 openai

Claude Code v2.1.49 hardens long-running agents, adds audit hooks, and moves Max users to Sonnet 4.6 (1M)

Anthropic shipped Claude Code v2.1.49 with major stability and performance fixes for long-running sessions, new enterprise audit controls, and a Max-plan model shift to Sonnet 4.6 with a 1M-token context window. The v2.1.49 release notes highlight concrete fixes for memory growth in WASM parsing and layout engines, background agent interrupt handling (double Ctrl+C/ESC), faster non-interactive startup (-p), plugin scope auto-detection, and a prompt cache regression fix, plus simple mode gains a direct file edit tool and SDKs now expose capability flags like supportsAdaptiveThinking ([release](https://github.com/anthropics/claude-code/releases/tag/v2.1.49)). Enterprise teams get a new ConfigChange hook to log or block config edits mid-session, and Max-plan users should switch to Sonnet 4.6 (1M) as Sonnet 4.5 (1M) is being removed. For context on why these changes matter, Anthropic engineers have emphasized prompt caching as key to cost/latency on long-lived agent workflows ([note](https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-everything)), and leadership is openly reframing developer roles toward reviewing and steering AI-authored code rather than typing it by hand ([Boris Cherny interview](https://www.youtube.com/watch?v=We7BZVKbCVw&t=977s&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)).

calendar_today 2026-02-20
claude-code anthropic claude-sonnet-46 github sonnet-46

Open-weight "AI engineer" models arrive: Qwen 3.5, GLM-5, MiniMax M2.5

A new wave of open-weight frontier models now rivals closed systems on coding and long-horizon agent tasks, making self-hosted AI engineer workflows practical for backend and data teams. Alibaba’s Qwen 3.5 ships as an open‑weights Mixture‑of‑Experts model (397B total, 17B active) with multimodal input and a 256K context, alongside a hosted Qwen3.5‑Plus variant offering 1M context and built‑in tools; details and early impressions are summarized by Simon Willison’s write‑up of the [Qwen 3.5 release](https://simonwillison.net/2026/Feb/17/qwen35/#atom-everything) and the official [Qwen blog](https://qwen.ai/blog?id=qwen3.5). Z.ai’s GLM‑5 launched open source with top open-model scores on SWE‑bench‑Verified (77.8) and Terminal Bench 2.0 (56.2), plus long‑context and RL‑driven agent training advances, with the announcement and code at [BusinessWire](https://www.businesswire.com/news/home/20260215030665/en/GLM-5-Launch-Signals-a-New-Era-in-AI-When-Models-Become-Engineers) and the [GitHub repo](https://github.com/zai-org/GLM-5). MiniMax M2.5 claims state‑of‑the‑art coding/agent performance (e.g., 80.2% SWE‑Bench Verified) and aggressive cost/speed on its [Hugging Face card](https://huggingface.co/unsloth/MiniMax-M2.5), while hands‑on videos compare real coding runs for GLM‑5 and M2.5; you can also quickly trial free models via [OpenRouter’s free router](https://openrouter.ai/openrouter/free).

calendar_today 2026-02-17
qwen35-397b-a17b qwen35-plus qwen-chat alibaba-cloud glm-5

Anthropic’s Claude Code pushes into regulated enterprises as devs demand more agent transparency

Anthropic is expanding Claude Code from internal-heavy code generation to regulated enterprise use while shipping updates and fielding developer concerns about opaque agent behavior. Anthropic says its AI systems now generate nearly all of the company’s internal code, reframing engineers’ roles toward system design and review as described in this report from Moneycontrol ([source](https://www.moneycontrol.com/news/business/information-technology/why-anthropic-says-engineers-matter-more-than-ever-even-as-ai-writes-the-code-13830811.html)). Building on that, Anthropic announced a collaboration with Infosys to deliver agentic AI for telecom, financial services, and manufacturing via Infosys Topaz and the Claude Agent SDK, targeting persistent, multi-step workflows with governance needs ([announcement](https://www.anthropic.com/news/anthropic-infosys)). AWS also outlined how to run Claude Code in compliance-sensitive environments on Amazon Bedrock, aimed at aligning AI-assisted dev work with strict controls ([AWS blog](https://aws.amazon.com/blogs/machine-learning/supercharge-regulated-workloads-with-claude-code-and-amazon-bedrock/)). On the ground, developers called out visibility gaps around what agents do to their codebases in a widely discussed Hacker News thread ([discussion](https://news.ycombinator.com/item?id=47033622)), even as Anthropic continues frequent incremental fixes such as auth refresh repairs and improved error messaging in recent Claude Code releases ([release notes](https://github.com/anthropics/claude-code/releases)). Community demos show evolving workflows—like Plan Mode and multi-agent patterns in Opus 4.6—that promise more autonomous execution but heighten the need for auditability ([Plan Mode walkthrough](https://www.youtube.com/watch?v=fxj82iBWypA&pp=ygUSQ2xhdWRlIENvZGUgdXBkYXRl), [Agent Teams demo](https://www.youtube.com/watch?v=6UKUQNcRk2k&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)).

calendar_today 2026-02-17
anthropic claude claude-code claude-agent-sdk infosys

GLM-5 and MiniMax M2.5 push low-cost, agentic coding into production range

Two Chinese releases—Zhipu AI’s GLM-5 and MiniMax M2.5—signal a shift toward affordable, agentic coding models that challenge frontier systems on practical benchmarks. Zhipu AI’s GLM-5 is positioned as an MIT-licensed open model with a native Agent Mode that rivals proprietary leaders on multiple benchmarks, with a deep-dive detailing its pre-launch appearance under a pseudonym and hints from vLLM pull requests ([official overview](https://z.ai/blog/glm-5?_bhlid=d84a093754c9e11cb0d2e9ff416fd99cb5f0e2da), [leak analysis](https://medium.com/reading-sh/glm-5-chinas-745b-parameter-open-source-model-that-leaked-before-it-launched-b2cfbafe99ef?source=rss-8af100df272------2), [weights claim](https://medium.com/ai-software-engineer/glm-5-arrive-with-a-bang-from-vibe-coding-to-agentic-engineering-disrupts-opus-b2b13f02b819)). MiniMax’s M2.5 posts strong results on coding and agentic tasks—80.2% SWE-Bench Verified, 51.3% Multi-SWE-Bench, 76.3% BrowseComp—while running 37% faster than M2.1 and costing roughly $1/hour at 100 tokens/sec (or $0.30/hour at 50 tps), with speed reportedly matching Claude Opus 4.6 ([release details](https://www.minimax.io/news/minimax-m25)). For developer workflows, quick-start videos show GLM-5 (and similarly Kimi K2.5) slotting into Claude Code with minimal setup, lowering trial friction inside existing IDEs ([GLM-5 with Claude Code](https://www.youtube.com/watch?v=Ey-HW-nJBiw&pp=ygURQ3Vyc29yIElERSB1cGRhdGU%3D), [Kimi K2.5 with Claude Code](https://www.youtube.com/watch?v=yZtLwOhmHps&pp=ygURQ3Vyc29yIElERSB1cGRhdGU%3D)).

calendar_today 2026-02-12
zhipu-ai glm-5 minimax minimax-m25 openrouter

Claude Code’s agentic push meets release governance

Claude Code is moving from autocomplete to autonomous delivery, and new updates plus governance patterns show how to adopt it safely across backends and data pipelines. Anthropic shipped multiple February hardening updates to Claude Code (2.1.39–2.1.42) that add a guard against nested sessions, clearer Bedrock/Vertex/Foundry fallbacks, CLI auth, Windows ARM64 support, and richer OpenTelemetry spans via a new speed attribute ([release notes](https://releasebot.io/updates/anthropic/claude-code)). As agentic coding scales beyond snippets to plans, tests, and commits, [Unleash’s guide](https://www.getunleash.io/blog/claude-code-unleash-agentic-ai-release-governance) lays out a FeatureOps playbook (standard flag naming, mandatory gating, and cleanup) tailored to Claude Code’s terminal + MCP architecture. For rollout, pilot Agent Teams on a low-risk service and wire it into CI under flags using this 13‑minute walkthrough ([video](https://www.youtube.com/watch?v=y9IYtWELMHw&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)), scaffold workflows with the community’s [ultimate guide](https://github.com/FlorianBruniaux/claude-code-ultimate-guide), and use this Opus 4.6 technical dive to inform capability boundaries and prompt patterns ([deep dive](https://medium.com/@comeback01/the-arrival-of-claude-opus-4-6-a-technical-deep-dive-into-the-enterprise-ai-singularity-0f86002836c1)).

calendar_today 2026-02-12
anthropic claude-code unleash claude-opus-46 bedrock

Claude Opus 4.6 adds agent teams, 1M context, and fast mode; GPT-5.3-Codex counters

Anthropic’s Claude Opus 4.6 ships multi-agent coding, a 1M-token context window, and a 2.5x fast mode, while OpenAI’s GPT-5.3-Codex brings faster agentic coding with strong benchmark results. DeepLearning.ai details Opus 4.6’s long-context, agentic coding gains, new API controls, and Codex 5.3’s speed and scores, plus pricing context [Data Points: Claude Opus 4.6 pushes the envelope](https://www.deeplearning.ai/the-batch/claude-opus-4-6-pushes-the-envelope/)[^1]. AI Collective highlights Claude Code’s new multi-agent “agent teams,” Office sidebars, and head-to-head benchmark moves versus OpenAI, while Storyboard18 confirms a 2.5x “fast mode” rollout for urgent work [Anthropic’s Opus 4.6 Agent Teams & OpenAI’s Codex 5.3](https://aicollective.substack.com/p/the-brief-anthropics-opus-46-agent)[^2] and [Anthropic rolls out fast mode for Claude Code](https://www.storyboard18.com/digital/anthropic-rolls-out-fast-mode-for-claude-code-to-speed-up-developer-workflows-89148.htm)[^3]. [^1]: Roundup covering features, benchmarks, and pricing for Opus 4.6 and GPT‑5.3‑Codex. [^2]: Newsletter with details on "agent teams," 1M-context performance, Office integrations, and comparative benchmarks. [^3]: Report on the 2.5x faster "fast mode" availability and target use cases.

calendar_today 2026-02-09
anthropic claude-opus-46 claude-code openai gpt-53-codex

Codex 5.3 vs Opus 4.6: agentic speed vs long‑context depth

OpenAI's GPT-5.3 Codex and Anthropic's Claude Opus 4.6 arrive with distinct strengths—Codex favors faster agentic execution while Opus excels at long-context reasoning and consistency—so choose based on workflow fit, not hype. Independent hands-on comparisons report Codex 5.3 is snappier and stronger at end-to-end coding actions, while Opus 4.6 is more reliable with context and less babysitting for routine repo tasks, with benchmark numbers and capabilities outlining the trade-offs in real projects ([Interconnects](https://www.interconnects.ai/p/opus-46-vs-codex-53)[^1], [Tensorlake](https://www.tensorlake.ai/blog/claude-opus-4-6-vs-gpt-5-3-codex)[^2]). Opus adds agent teams, 1M-token context (beta), adaptive effort controls, and Codex claims ~25% speed gains and agentic improvements, underscoring a shift toward practical, multi-step workflows ([Elephas](https://elephas.app/resources/claude-opus-4-6-vs-gpt-5-3-codex)[^3]). [^1]: Adds: Usability differences from field use; Opus needs less supervision on mundane tasks while Codex 5.3 improved but can misplace/skip files. [^2]: Adds: Concrete benchmarks (SWE Bench Pro, Terminal Bench 2.0, OSWorld) and scenario-based comparison for UI/data workflows. [^3]: Adds: Feature deltas (Agent Teams, 1M context, adaptive thinking) and speed claims/timing details across both launches.

calendar_today 2026-02-09
openai anthropic gpt-53-codex claude-opus-46 claude-code

Opus 4.6 Agent Teams vs GPT-5.3 Codex: multi‑agent coding arrives for real SDLC work

Anthropic's Claude Opus 4.6 brings multi-agent "Agent Teams" and a 1M-token context while OpenAI's GPT-5.3-Codex counters with faster, stronger agentic coding, together signaling a step change in AI-assisted development. Opus 4.6 adds team-based parallelization in Claude Code, long‑context retrieval gains, adaptive reasoning/effort controls, and Office sidebars, with pricing unchanged [Data Points](https://www.deeplearning.ai/the-batch/claude-opus-4-6-pushes-the-envelope/)[^1] and launch coverage framing initial benchmark leads at release [AI Collective](https://aicollective.substack.com/p/the-brief-anthropics-opus-46-agent)[^2]. OpenAI’s GPT‑5.3‑Codex posts top results on SWE‑Bench Pro and Terminal‑Bench 2.0 and helped debug its own training pipeline [Data Points](https://www.deeplearning.ai/the-batch/claude-opus-4-6-pushes-the-envelope/)[^3], while practitioners surface Claude Code’s new Auto‑Memory behavior/controls for safer long‑running projects [Reddit](https://www.reddit.com/r/ClaudeCode/comments/1qzmofn/how_claude_code_automemory_works_official_feature/)[^4] and Anthropic leaders say AI now writes nearly all their internal code [India Today](https://www.indiatoday.in/technology/news/story/anthropic-says-ai-writing-nearly-100-percent-code-internally-claude-basically-writes-itself-now-2865644-2026-02-09)[^5]. [^1]: Adds: Opus 4.6 features (1M context), long‑context results, adaptive/effort/compaction API controls, and unchanged pricing. [^2]: Adds: Agent Teams in Claude Code, Office (Excel/PowerPoint) sidebars, 1M context, and benchmark framing at launch. [^3]: Adds: GPT‑5.3‑Codex benchmarks, 25% speedup, availability, and self‑use in OAI’s training/deployment pipeline. [^4]: Adds: Concrete Auto‑Memory details (location, 200‑line cap) and disable flag for policy compliance. [^5]: Adds: Real‑world claim of near‑100% AI‑written internal code at Anthropic, indicating mature SDLC use.

calendar_today 2026-02-09
anthropic openai claude-opus-46 claude-code gpt-53-codex

Copilot CLI 0.0.406 adds MCP upgrades and Claude preview; community proxy unlocks Copilot in Cursor

GitHub Copilot CLI 0.0.406 brings MCP-focused UX improvements, a Claude Opus 4.6 Fast preview, and safer flags, while a community proxy shows how to use a Copilot subscription inside Cursor’s Agent features. Per the official notes, v0.0.406 adds Claude Opus 4.6 Fast support, command-to-skill translation, /changelog, MCP status, structured responses for VS Code, URL-based plugin marketplace, and a --no-experimental flag [GitHub Copilot CLI releases](https://github.com/github/copilot-cli/releases)[^1]. A community guide details a "Copilot Proxy for Cursor" that routes Cursor to your Copilot key with MCP/tool support and vision handling; use cautiously given it relies on internal APIs [DEV: Unlock GitHub Copilot in Cursor](https://dev.to/jacksonkasi/unlock-github-copilot-in-cursor-the-ultimate-guide-free-unlimited-4i9c)[^2]. [^1]: Adds: Official 0.0.406 features, MCP/skills changes, and safety flags. [^2]: Adds: How the proxy works, setup steps, supported models/tools, and caveats.

calendar_today 2026-02-07
github-copilot github copilot-cli cursor anthropic

Hands-on: Claude Opus 4.6 nails non‑agentic coding; GPT‑5.3 Codex lacks API

A 48-hour hands-on found Claude Opus 4.6 delivering perfect non-agentic coding results while GPT‑5.3 Codex looks strong in benchmarks but still lacks API access for validation. In this test-run, Opus 4.6 hit 100% across 11 single-shot coding tasks (including 3D layout, SVG composition, and legal-move chess) and contradicted popular benchmark narratives, while Codex couldn’t be reproduced due to no API access yet per this report [I Spent 48 Hours Testing Claude Opus 4.6 & GPT-5.3 Codex](https://medium.com/@info.booststash/i-spent-48-hours-testing-claude-opus-4-6-gpt-5-3-codex-004adc046312)[^1]. [^1]: Adds: hands-on results, examples, benchmark context, and note on GPT‑5.3 Codex API unavailability.

calendar_today 2026-02-07
claude-opus-46 gpt-53-codex anthropic openai terminal-bench

User flags degraded Claude Opus 4.6 behavior and higher credit burn in Windsurf vs Claude Code

A Reddit report describes noticeably worse results and more credit burn when using Claude Opus 4.6 through Windsurf compared to running the same model via Claude Code directly. The post details unnecessary back-and-forth, confrontational replies, and 2×–4× credit multipliers in [this thread](https://www.reddit.com/r/windsurf/comments/1qxpcfd/is_anyone_else_getting_really_frustrated_with/)[^1]. [^1]: Adds: First-hand comparison of Windsurf vs Claude Code behavior, including examples and credit multipliers.

calendar_today 2026-02-07
windsurf claude-opus-46 claude-code claude-opus ai-coding-assistants

Choosing Cursor, Windsurf, or Claude Code for backend workflows

The AI coding stack is bifurcating: IDE-first agents like [Cursor](https://serenitiesai.com/articles/cursor-ai-vs-windsurf-vs-claude-code-2026)[^2] and Windsurf emphasize editor-native control, while [Claude Code](https://rajsarkar.substack.com/p/part-4-cursor-vs-claude-code-two)[^1] is terminal-native and architected for agentic, repo-wide plans and execution—pick based on your team’s primary locus of work (editor vs CLI). Near-term shifts matter: rumors of Anthropic’s Sonnet 5 and OpenAI’s upcoming Codex updates could change cost/throughput and tool hooks, but balance vendor claims against independent evidence that AI boosts can inhibit skills formation and may be uneven across experience levels ([Handy AI](https://handyai.substack.com/p/anthropic-preps-sonnet-5-while-openai)[^3], [ITPro](https://www.itpro.com/software/development/anthropic-research-ai-coding-skills-formation-impact)[^4], [Futurum](https://futurumgroup.com/insights/100-ai-generated-code-can-you-code-like-boris/)[^5]). [^1]: Adds: hands-on analysis contrasting IDE vs CLI mental models and Claude Code’s agentic loop. [^2]: Adds: feature/pricing comparison and trade-offs across Cursor, Windsurf, and Claude Code. [^3]: Adds: rumor timeline on Sonnet 5 and OpenAI Codex/GPT-5.3 rollouts that could shift capabilities. [^4]: Adds: Anthropic fellows’ study showing productivity gains can inhibit skills formation, especially when delegating fully. [^5]: Adds: reality check contrasting 100% AI-code claims with broad empirical findings on actual gains and reliability.

calendar_today 2026-02-03
cursor windsurf claude-code anthropic openai

Update: Shift from Bigger LLMs to Tool-Using Agents

New coverage moves from high-level trend to concrete examples: agentic systems with persistent memory, tool-grounded actions, and human-in-the-loop controls. The video highlights vendor moves (e.g., Anthropic’s Claude/Claude Code updates and DeepMind’s agent-first roadmap) as evidence that reliability/cost gains now come from tools, memory, and planning rather than scaling base models.

calendar_today 2025-12-30
agents tool-use memory enterprise-ai rag