GOOGLE PUB_DATE: 2026.05.20

GOOGLE SHIPS GEMINI 3.5 FLASH: GA, AGENT-READY, AND FAST ENOUGH TO MATTER

Google launched Gemini 3.5 Flash, a general-availability model tuned for agentic workflows with big speed gains and lower API pricing than Pro. Google says 3.5...

Google ships Gemini 3.5 Flash: GA, agent-ready, and fast enough to matter

Google launched Gemini 3.5 Flash, a general-availability model tuned for agentic workflows with big speed gains and lower API pricing than Pro.

Google says 3.5 Flash is its fastest agent/coding model yet—up to 4x quicker than rival frontier models—and it’s live across the Gemini app, Search AI Mode, and the Gemini API today (Interesting Engineering, Lifehacker).

Ars reports ~300 tokens/sec and API pricing at $1.50/M input, $9/M output (cheaper than Pro), alongside a 1M-token context and 65k output window noted by Simon Willison (Ars Technica, Simon Willison).

Google is also piloting a new Interactions API (server-side history) and rolling 3.5 Flash through Antigravity, AI Studio, and Android Studio; the llm-gemini plugin adds immediate support (Simon Willison, llm-gemini 0.32, llm-gemini 0.32a0).

[ WHY_IT_MATTERS ]
01.

3.5 Flash pairs frontier-level reasoning with much lower latency, making long-running agent jobs and code automation more feasible at scale.

02.

API pricing below Pro shifts cost/perf math for high-volume workloads where output tokens dominate.

[ WHAT_TO_TEST ]
  • terminal

    Benchmark your real agent flows (tool use, retries, parallel steps) on 3.5 Flash vs current model for latency, cost per task, and accuracy.

  • terminal

    Soak-test long-horizon jobs with backpressure, cancellation, and idempotency; validate output-token spikes and guardrails.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

  • 01.

    Drop 3.5 Flash behind your LLM gateway and run canary routing; keep fallbacks to Pro/other vendors for edge cases.

  • 02.

    Audit tokenizer and format drift in prompts/functions; update observability to track per-step token and error rates.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

  • 01.

    Design agents around the Interactions API for server-side history and cheaper context reuse.

  • 02.

    Default to 3.5 Flash for orchestration and coding tasks; isolate modules that may later need Pro-class reasoning.

Enjoying_this_story?

Get daily GOOGLE + SDLC updates.

  • Practical tactics you can ship tomorrow
  • Tooling, workflows, and architecture notes
  • One short email each weekday

FREE_FOREVER. TERMINATE_ANYTIME. View an example issue.

GET_DAILY_EMAIL
AI + SDLC // 5 MIN DAILY