terminal
howtonotcode.com
Qwen logo

Qwen

Ai Tool

Qwen is a family of large language models developed by Alibaba Cloud.

article 6 storys calendar_today First seen: 2026-02-11 update Last seen: 2026-03-03 open_in_new Website menu_book Wikipedia

Resources

Links to check for updates: homepage, feed, or git repo.

home Homepage

Stories

Showing 1-6 of 6

Google’s Gemini 3.1 Flash-Lite targets high-volume, low-latency workloads

Google released Gemini 3.1 Flash-Lite, a faster, cheaper model aimed at high-volume developer workloads and signaling a broader shift to lighter LLMs for routine backend and data tasks. Google’s launch of [Gemini 3.1 Flash-Lite](https://thenewstack.io/google-gemini-3-1-flash-lite/) emphasizes low-latency responses for tasks where cost is critical, with preview access via the Gemini API in Google AI Studio and enterprise access in Vertex AI, alongside industry moves like OpenAI’s GPT-5.3 Instant toward lighter models ([context and availability](https://www.thedeepview.com/articles/openai-google-target-lighter-models)). Independent coverage pegs Flash-Lite at $0.25/million input tokens and $1.5/million output tokens—about one-eighth the price of Gemini 3.1 Pro—and notes support for four “thinking” levels to trade speed for reasoning when needed ([pricing and modes](https://simonwillison.net/2026/Mar/3/gemini-31-flash-lite/#atom-everything)). For backend/data teams, this sweet spot makes Flash-Lite a strong default for translation, content moderation, summarization, and structured generation (dashboards/simulations), reserving heavier models for only the hardest requests ([use cases](https://www.thedeepview.com/articles/openai-google-target-lighter-models)). If your pipelines push files, mind Gemini’s surface-specific limits across Apps (including NotebookLM notebooks), API, and enterprise tools—think up to 10 files per prompt, 100MB per file/ZIP with caveats, strict video caps, and code folder/GitHub repo constraints—so ingestion doesn’t silently truncate or fail ([file-handling constraints](https://www.datastudios.org/post/gemini-file-upload-support-explained-supported-formats-size-constraints-and-document-handling-acr)). Zooming out, the race to lighter models (OpenAI’s GPT-5.3 Instant and Alibaba’s Qwen Small Model Series) underscores a clear pattern: push routine throughput to cheaper, faster tiers and escalate to heavyweight reasoning only on ambiguity or failure ([trend snapshot](https://www.thedeepview.com/articles/openai-google-target-lighter-models)).

calendar_today 2026-03-03
google gemini-31-flash-lite gemini-api google-ai-studio vertex-ai

Coding Benchmarks Shake-up: Qwen 3.5, MiniMax M2.5, and a SWE-bench Reality Check

Open models like Alibaba’s Qwen 3.5 and MiniMax M2.5 post strong coding-agent results, but OpenAI’s audit of SWE-bench Verified shows contamination and flawed tests that can mislead real-world adoption. Alibaba’s Qwen 3.5 family uses a sparse MoE design (397B total/17B active), ships open weights under Apache 2.0, and shows strong instruction following and competitive coding scores in public benchmarks, with setup guidance and comparisons to frontier models detailed in this deep-dive guide [Qwen 3.5: The Complete Guide](https://techie007.substack.com/p/qwen-35-the-complete-guide-benchmarks). MiniMax’s latest model claims state-of-the-art coding and agentic performance, faster task completion, and ultra-low runtime cost (about $1/hour at 100 tok/s), alongside reported scores on coding and browsing evaluations [MiniMax-M2.5 on Hugging Face](https://huggingface.co/unsloth/MiniMax-M2.5). OpenAI, however, reports that many SWE-bench Verified tasks have broken tests and that major models were trained on benchmark solutions, halting its use of the metric and urging caution in interpreting scores [OpenAI Abandons SWE-bench Verified](https://blockchain.news/news/openai-abandons-swe-bench-verified-contamination-flawed-tests). For quick, low-cost trials of multiple “top models,” a short explainer points to an Alibaba Cloud coding plan bundling popular options [This $3 AI Coding Plan Gives You Every Top Model You Need](https://www.youtube.com/watch?v=Qnz7S-5fzWo&pp=ygUXbmV3IEFJIG1vZGVsIGZvciBjb2RpbmfSBwkJrgoBhyohjO8%3D).

calendar_today 2026-03-03
qwen-35 alibaba alibaba-cloud minimax-m25 openai

Open-weight "AI engineer" models arrive: Qwen 3.5, GLM-5, MiniMax M2.5

A new wave of open-weight frontier models now rivals closed systems on coding and long-horizon agent tasks, making self-hosted AI engineer workflows practical for backend and data teams. Alibaba’s Qwen 3.5 ships as an open‑weights Mixture‑of‑Experts model (397B total, 17B active) with multimodal input and a 256K context, alongside a hosted Qwen3.5‑Plus variant offering 1M context and built‑in tools; details and early impressions are summarized by Simon Willison’s write‑up of the [Qwen 3.5 release](https://simonwillison.net/2026/Feb/17/qwen35/#atom-everything) and the official [Qwen blog](https://qwen.ai/blog?id=qwen3.5). Z.ai’s GLM‑5 launched open source with top open-model scores on SWE‑bench‑Verified (77.8) and Terminal Bench 2.0 (56.2), plus long‑context and RL‑driven agent training advances, with the announcement and code at [BusinessWire](https://www.businesswire.com/news/home/20260215030665/en/GLM-5-Launch-Signals-a-New-Era-in-AI-When-Models-Become-Engineers) and the [GitHub repo](https://github.com/zai-org/GLM-5). MiniMax M2.5 claims state‑of‑the‑art coding/agent performance (e.g., 80.2% SWE‑Bench Verified) and aggressive cost/speed on its [Hugging Face card](https://huggingface.co/unsloth/MiniMax-M2.5), while hands‑on videos compare real coding runs for GLM‑5 and M2.5; you can also quickly trial free models via [OpenRouter’s free router](https://openrouter.ai/openrouter/free).

calendar_today 2026-02-17
qwen35-397b-a17b qwen35-plus qwen-chat alibaba-cloud glm-5

LLM safety erosion: single-prompt fine-tuning and URL preview data leaks

Enterprise fine-tuning and common chat UI features can quickly undermine LLM safety and silently exfiltrate data, so treat agentic AI security as a lifecycle with zero‑trust controls and gated releases. Microsoft’s GRP‑Obliteration shows a single harmful prompt used with GRPO can collapse guardrails across several model families, reframing safety as an ongoing process rather than a one‑time alignment step [InfoWorld](https://www.infoworld.com/article/4130017/single-prompt-breaks-ai-safety-in-15-major-language-models-2.html)[^1] and is reinforced by a recap urging teams to add safety evaluations to CI/CD pipelines [TechRadar](https://www.techradar.com/pro/microsoft-researchers-crack-ai-guardrails-with-a-single-prompt)[^2]. Separately, researchers demonstrate that automatic URL previews can exfiltrate sensitive data via prompt‑injected links, and a practical release checklist outlines SDLC gates to verify value, trust, and safety before launching agents [WebProNews](https://www.webpronews.com/the-silent-leak-how-url-previews-in-llm-powered-tools-are-quietly-exfiltrating-sensitive-data/)[^3] [InfoWorld](https://www.infoworld.com/article/4105884/10-essential-release-criteria-for-launching-ai-agents.html)[^4]. [^1]: Adds: original reporting on Microsoft’s GRP‑Obliteration results and cross‑model safety degradation. [^2]: Adds: lifecycle framing and guidance to integrate safety evaluations into CI/CD. [^3]: Adds: concrete demonstration of URL‑preview data exfiltration via prompt injection (OpenClaw case study). [^4]: Adds: actionable release‑readiness checklist for AI agents (metrics, testing, governance).

calendar_today 2026-02-10
microsoft azure gpt-oss deepseek-r1-distill google