terminal
howtonotcode.com
Terminal Bench logo

Terminal Bench

Service

Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, DeepSeek is owned and funded by the Chinese hedge fund High-Flyer. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both of the companies. The company launched an eponymous chatbot alongside its DeepSeek-R1

article 3 storys calendar_today First seen: 2026-02-09 update Last seen: 2026-02-17 open_in_new Website menu_book Wikipedia

Resources

Links to check for updates: homepage, feed, or git repo.

home Homepage

Stories

Showing 1-3 of 3

Open-weight "AI engineer" models arrive: Qwen 3.5, GLM-5, MiniMax M2.5

A new wave of open-weight frontier models now rivals closed systems on coding and long-horizon agent tasks, making self-hosted AI engineer workflows practical for backend and data teams. Alibaba’s Qwen 3.5 ships as an open‑weights Mixture‑of‑Experts model (397B total, 17B active) with multimodal input and a 256K context, alongside a hosted Qwen3.5‑Plus variant offering 1M context and built‑in tools; details and early impressions are summarized by Simon Willison’s write‑up of the [Qwen 3.5 release](https://simonwillison.net/2026/Feb/17/qwen35/#atom-everything) and the official [Qwen blog](https://qwen.ai/blog?id=qwen3.5). Z.ai’s GLM‑5 launched open source with top open-model scores on SWE‑bench‑Verified (77.8) and Terminal Bench 2.0 (56.2), plus long‑context and RL‑driven agent training advances, with the announcement and code at [BusinessWire](https://www.businesswire.com/news/home/20260215030665/en/GLM-5-Launch-Signals-a-New-Era-in-AI-When-Models-Become-Engineers) and the [GitHub repo](https://github.com/zai-org/GLM-5). MiniMax M2.5 claims state‑of‑the‑art coding/agent performance (e.g., 80.2% SWE‑Bench Verified) and aggressive cost/speed on its [Hugging Face card](https://huggingface.co/unsloth/MiniMax-M2.5), while hands‑on videos compare real coding runs for GLM‑5 and M2.5; you can also quickly trial free models via [OpenRouter’s free router](https://openrouter.ai/openrouter/free).

calendar_today 2026-02-17
qwen35-397b-a17b qwen35-plus qwen-chat alibaba-cloud glm-5

Codex 5.3 vs Opus 4.6: agentic speed vs long‑context depth

OpenAI's GPT-5.3 Codex and Anthropic's Claude Opus 4.6 arrive with distinct strengths—Codex favors faster agentic execution while Opus excels at long-context reasoning and consistency—so choose based on workflow fit, not hype. Independent hands-on comparisons report Codex 5.3 is snappier and stronger at end-to-end coding actions, while Opus 4.6 is more reliable with context and less babysitting for routine repo tasks, with benchmark numbers and capabilities outlining the trade-offs in real projects ([Interconnects](https://www.interconnects.ai/p/opus-46-vs-codex-53)[^1], [Tensorlake](https://www.tensorlake.ai/blog/claude-opus-4-6-vs-gpt-5-3-codex)[^2]). Opus adds agent teams, 1M-token context (beta), adaptive effort controls, and Codex claims ~25% speed gains and agentic improvements, underscoring a shift toward practical, multi-step workflows ([Elephas](https://elephas.app/resources/claude-opus-4-6-vs-gpt-5-3-codex)[^3]). [^1]: Adds: Usability differences from field use; Opus needs less supervision on mundane tasks while Codex 5.3 improved but can misplace/skip files. [^2]: Adds: Concrete benchmarks (SWE Bench Pro, Terminal Bench 2.0, OSWorld) and scenario-based comparison for UI/data workflows. [^3]: Adds: Feature deltas (Agent Teams, 1M context, adaptive thinking) and speed claims/timing details across both launches.

calendar_today 2026-02-09
openai anthropic gpt-53-codex claude-opus-46 claude-code

Hands-on: Claude Opus 4.6 nails non‑agentic coding; GPT‑5.3 Codex lacks API

A 48-hour hands-on found Claude Opus 4.6 delivering perfect non-agentic coding results while GPT‑5.3 Codex looks strong in benchmarks but still lacks API access for validation. In this test-run, Opus 4.6 hit 100% across 11 single-shot coding tasks (including 3D layout, SVG composition, and legal-move chess) and contradicted popular benchmark narratives, while Codex couldn’t be reproduced due to no API access yet per this report [I Spent 48 Hours Testing Claude Opus 4.6 & GPT-5.3 Codex](https://medium.com/@info.booststash/i-spent-48-hours-testing-claude-opus-4-6-gpt-5-3-codex-004adc046312)[^1]. [^1]: Adds: hands-on results, examples, benchmark context, and note on GPT‑5.3 Codex API unavailability.

calendar_today 2026-02-07
claude-opus-46 gpt-53-codex anthropic openai terminal-bench