terminal
howtonotcode.com
Opus 4.6 logo

Opus 4.6

Ai Tool

Claude Opus 4.6 is a large language model by Anthropic.

article 7 storys calendar_today First seen: 2026-02-17 update Last seen: 2026-03-03 open_in_new Website menu_book Wikipedia

Resources

Links to check for updates: homepage, feed, or git repo.

home Homepage

Stories

Showing 1-7 of 7

Coding Benchmarks Shake-up: Qwen 3.5, MiniMax M2.5, and a SWE-bench Reality Check

Open models like Alibaba’s Qwen 3.5 and MiniMax M2.5 post strong coding-agent results, but OpenAI’s audit of SWE-bench Verified shows contamination and flawed tests that can mislead real-world adoption. Alibaba’s Qwen 3.5 family uses a sparse MoE design (397B total/17B active), ships open weights under Apache 2.0, and shows strong instruction following and competitive coding scores in public benchmarks, with setup guidance and comparisons to frontier models detailed in this deep-dive guide [Qwen 3.5: The Complete Guide](https://techie007.substack.com/p/qwen-35-the-complete-guide-benchmarks). MiniMax’s latest model claims state-of-the-art coding and agentic performance, faster task completion, and ultra-low runtime cost (about $1/hour at 100 tok/s), alongside reported scores on coding and browsing evaluations [MiniMax-M2.5 on Hugging Face](https://huggingface.co/unsloth/MiniMax-M2.5). OpenAI, however, reports that many SWE-bench Verified tasks have broken tests and that major models were trained on benchmark solutions, halting its use of the metric and urging caution in interpreting scores [OpenAI Abandons SWE-bench Verified](https://blockchain.news/news/openai-abandons-swe-bench-verified-contamination-flawed-tests). For quick, low-cost trials of multiple “top models,” a short explainer points to an Alibaba Cloud coding plan bundling popular options [This $3 AI Coding Plan Gives You Every Top Model You Need](https://www.youtube.com/watch?v=Qnz7S-5fzWo&pp=ygUXbmV3IEFJIG1vZGVsIGZvciBjb2RpbmfSBwkJrgoBhyohjO8%3D).

calendar_today 2026-03-03
qwen-35 alibaba alibaba-cloud minimax-m25 openai

Claude Code Security preview lands alongside key CLI hardening

Anthropic shipped a limited Claude Code Security preview to scan repos and suggest patches, alongside CLI updates that improve remote build control, sandboxed hooks, and context efficiency. Anthropic’s code-scanning capability is now built into Claude Code as a limited research preview for Enterprise and Team customers, with human-in-the-loop patch suggestions and expedited access for OSS maintainers, per coverage from [CSO Online](https://www.csoonline.com/article/4136294/anthropics-claude-code-security-rollout-is-an-industry-wakeup-call.html). In parallel, the CLI added a new remote-control mode for external builds, hardened HTTP hooks behind a sandbox proxy and explicit allowedEnvVars, persisted large tool outputs to disk to save context, and fixed a workspace-trust gap—plus a Windows crash fix in the VS Code extension ([v2.1.51](https://github.com/anthropics/claude-code/releases/tag/v2.1.51), [v2.1.52](https://github.com/anthropics/claude-code/releases/tag/v2.1.52)). Teams are also adjusting to a simplified CLI output that hides some file I/O; practitioners suggest prompting for a pre-action file list to restore transparency and control, effectively a dry-run step ([community thread](https://www.reddit.com/r/ClaudeCode/comments/1rdj2hm/handling_the_simplified_output_changes_in_the/)). The wider ecosystem is keeping pace—LangChain’s Anthropic integration updated headers for 1M-context handling, model IDs, and tests, smoothing orchestration in agent workflows ([release notes](https://github.com/langchain-ai/langchain/releases/tag/langchain-anthropic%3D%3D1.3.4)).

calendar_today 2026-02-24
anthropic claude-code claude-code-security visual-studio-code langchain

Claude Code v2.1.49 hardens long-running agents, adds audit hooks, and moves Max users to Sonnet 4.6 (1M)

Anthropic shipped Claude Code v2.1.49 with major stability and performance fixes for long-running sessions, new enterprise audit controls, and a Max-plan model shift to Sonnet 4.6 with a 1M-token context window. The v2.1.49 release notes highlight concrete fixes for memory growth in WASM parsing and layout engines, background agent interrupt handling (double Ctrl+C/ESC), faster non-interactive startup (-p), plugin scope auto-detection, and a prompt cache regression fix, plus simple mode gains a direct file edit tool and SDKs now expose capability flags like supportsAdaptiveThinking ([release](https://github.com/anthropics/claude-code/releases/tag/v2.1.49)). Enterprise teams get a new ConfigChange hook to log or block config edits mid-session, and Max-plan users should switch to Sonnet 4.6 (1M) as Sonnet 4.5 (1M) is being removed. For context on why these changes matter, Anthropic engineers have emphasized prompt caching as key to cost/latency on long-lived agent workflows ([note](https://simonwillison.net/2026/Feb/20/thariq-shihipar/#atom-everything)), and leadership is openly reframing developer roles toward reviewing and steering AI-authored code rather than typing it by hand ([Boris Cherny interview](https://www.youtube.com/watch?v=We7BZVKbCVw&t=977s&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)).

calendar_today 2026-02-20
claude-code anthropic claude-sonnet-46 github sonnet-46

Delegation vs. coordination: Codex 5.3 or Opus 4.6 for your engineering workflows

OpenAI’s Codex 5.3 favors long-running autonomous delegation while Anthropic’s Opus 4.6 favors coordinated, tool-integrated agent teams, and picking one early will shape your workflows and switching costs. In this analysis of two same-day releases, Codex 5.3 is framed as an agent you hand a task to and walk away from for hours, whereas Opus 4.6 is positioned to plug into your existing tools, orchestrate agent teams, and extend beyond code into broader knowledge work ([read the comparison](https://natesnewsletter.substack.com/p/codex-53-vs-opus-46-two-agent-philosophies)). The piece contrasts a “correctness architecture” for Codex—aimed at producing work you can trust without reviewing every line—against Claude’s integration-first approach with a protocol layer and agent teams. For engineering leaders, the key moves are a workflow audit (which tasks benefit from autonomy vs. coordination), explicit correctness gates, and an understanding that this choice compounds—affecting org structure, toolchains, and the difficulty of switching later ([full brief](https://natesnewsletter.substack.com/p/codex-53-vs-opus-46-two-agent-philosophies)).

calendar_today 2026-02-17
openai anthropic codex-53 claude-opus-46 claude

Open-weight "AI engineer" models arrive: Qwen 3.5, GLM-5, MiniMax M2.5

A new wave of open-weight frontier models now rivals closed systems on coding and long-horizon agent tasks, making self-hosted AI engineer workflows practical for backend and data teams. Alibaba’s Qwen 3.5 ships as an open‑weights Mixture‑of‑Experts model (397B total, 17B active) with multimodal input and a 256K context, alongside a hosted Qwen3.5‑Plus variant offering 1M context and built‑in tools; details and early impressions are summarized by Simon Willison’s write‑up of the [Qwen 3.5 release](https://simonwillison.net/2026/Feb/17/qwen35/#atom-everything) and the official [Qwen blog](https://qwen.ai/blog?id=qwen3.5). Z.ai’s GLM‑5 launched open source with top open-model scores on SWE‑bench‑Verified (77.8) and Terminal Bench 2.0 (56.2), plus long‑context and RL‑driven agent training advances, with the announcement and code at [BusinessWire](https://www.businesswire.com/news/home/20260215030665/en/GLM-5-Launch-Signals-a-New-Era-in-AI-When-Models-Become-Engineers) and the [GitHub repo](https://github.com/zai-org/GLM-5). MiniMax M2.5 claims state‑of‑the‑art coding/agent performance (e.g., 80.2% SWE‑Bench Verified) and aggressive cost/speed on its [Hugging Face card](https://huggingface.co/unsloth/MiniMax-M2.5), while hands‑on videos compare real coding runs for GLM‑5 and M2.5; you can also quickly trial free models via [OpenRouter’s free router](https://openrouter.ai/openrouter/free).

calendar_today 2026-02-17
qwen35-397b-a17b qwen35-plus qwen-chat alibaba-cloud glm-5

Anthropic’s Claude Code pushes into regulated enterprises as devs demand more agent transparency

Anthropic is expanding Claude Code from internal-heavy code generation to regulated enterprise use while shipping updates and fielding developer concerns about opaque agent behavior. Anthropic says its AI systems now generate nearly all of the company’s internal code, reframing engineers’ roles toward system design and review as described in this report from Moneycontrol ([source](https://www.moneycontrol.com/news/business/information-technology/why-anthropic-says-engineers-matter-more-than-ever-even-as-ai-writes-the-code-13830811.html)). Building on that, Anthropic announced a collaboration with Infosys to deliver agentic AI for telecom, financial services, and manufacturing via Infosys Topaz and the Claude Agent SDK, targeting persistent, multi-step workflows with governance needs ([announcement](https://www.anthropic.com/news/anthropic-infosys)). AWS also outlined how to run Claude Code in compliance-sensitive environments on Amazon Bedrock, aimed at aligning AI-assisted dev work with strict controls ([AWS blog](https://aws.amazon.com/blogs/machine-learning/supercharge-regulated-workloads-with-claude-code-and-amazon-bedrock/)). On the ground, developers called out visibility gaps around what agents do to their codebases in a widely discussed Hacker News thread ([discussion](https://news.ycombinator.com/item?id=47033622)), even as Anthropic continues frequent incremental fixes such as auth refresh repairs and improved error messaging in recent Claude Code releases ([release notes](https://github.com/anthropics/claude-code/releases)). Community demos show evolving workflows—like Plan Mode and multi-agent patterns in Opus 4.6—that promise more autonomous execution but heighten the need for auditability ([Plan Mode walkthrough](https://www.youtube.com/watch?v=fxj82iBWypA&pp=ygUSQ2xhdWRlIENvZGUgdXBkYXRl), [Agent Teams demo](https://www.youtube.com/watch?v=6UKUQNcRk2k&pp=ygUYQUkgY29kaW5nIGFnZW50IHdvcmtmbG93)).

calendar_today 2026-02-17
anthropic claude claude-code claude-agent-sdk infosys

Choosing your LLM lane: fast modes, Azure guardrails, and lock‑in risks

Picking between Azure OpenAI, OpenAI, and Anthropic now requires balancing fast‑mode latency tradeoffs, enterprise guardrails, and ecosystem lock‑in that will shape your backend and data pipelines. Kellton’s guide argues that Microsoft’s Azure OpenAI service brings OpenAI models into an enterprise‑ready envelope with compliance certifications, data residency, and cost control via reserved capacity, while integrating natively with Azure services ([overview](https://www.kellton.com/kellton-tech-blog/azure-openai-enterprise-business-intelligence-automation)). On performance, Sean Goedecke contrasts “fast mode” implementations: Anthropic’s approach serves the primary model with roughly ~2.5x higher token throughput, while OpenAI’s delivers >1000 tps via a faster, separate variant that can be less reliable for tool calls; he hypothesizes Anthropic leans on low‑batch inference and OpenAI on specialized Cerebras hardware ([analysis](https://www.seangoedecke.com/fast-llm-inference/)). A contemporaneous perspective frames OpenAI vs Anthropic as a fight to control developer defaults—your provider choice becomes a dependency that dictates pricing, latency profile, and roadmap gravity, not just model quality ([viewpoint](https://medium.com/@kakamber07/openai-vs-anthropic-is-not-about-ai-its-about-who-controls-developers-51ef2232777e)).

calendar_today 2026-02-17
azure-openai-service azure microsoft openai anthropic