Google ships Gemini 3.5 Flash: GA, agent…

GOOGLE PUB_DATE: 2026.05.20

GOOGLE SHIPS GEMINI 3.5 FLASH: GA, AGENT-READY, AND FAST ENOUGH TO MATTER

Google launched Gemini 3.5 Flash, a general-availability model tuned for agentic workflows with big speed gains and lower API pricing than Pro. Google says 3.5...

Google launched Gemini 3.5 Flash, a general-availability model tuned for agentic workflows with big speed gains and lower API pricing than Pro.

Google says 3.5 Flash is its fastest agent/coding model yet—up to 4x quicker than rival frontier models—and it’s live across the Gemini app, Search AI Mode, and the Gemini API today (Interesting Engineering, Lifehacker).

Ars reports ~300 tokens/sec and API pricing at $1.50/M input, $9/M output (cheaper than Pro), alongside a 1M-token context and 65k output window noted by Simon Willison (Ars Technica, Simon Willison).

Google is also piloting a new Interactions API (server-side history) and rolling 3.5 Flash through Antigravity, AI Studio, and Android Studio; the llm-gemini plugin adds immediate support (Simon Willison, llm-gemini 0.32, llm-gemini 0.32a0).

[ WHY_IT_MATTERS ]

01.

3.5 Flash pairs frontier-level reasoning with much lower latency, making long-running agent jobs and code automation more feasible at scale.

02.

API pricing below Pro shifts cost/perf math for high-volume workloads where output tokens dominate.

[ WHAT_TO_TEST ]

terminal
Benchmark your real agent flows (tool use, retries, parallel steps) on 3.5 Flash vs current model for latency, cost per task, and accuracy.
terminal
Soak-test long-horizon jobs with backpressure, cancellation, and idempotency; validate output-token spikes and guardrails.

[ BROWNFIELD_PERSPECTIVE ]

Legacy codebase integration strategies...

01.
Drop 3.5 Flash behind your LLM gateway and run canary routing; keep fallbacks to Pro/other vendors for edge cases.
02.
Audit tokenizer and format drift in prompts/functions; update observability to track per-step token and error rates.

[ GREENFIELD_PERSPECTIVE ]

Fresh architecture paradigms...

01.
Design agents around the Interactions API for server-side history and cheaper context reuse.
02.
Default to 3.5 Flash for orchestration and coding tasks; isolate modules that may later need Pro-class reasoning.

Enjoying_this_story?

Get daily GOOGLE + SDLC updates.

Practical tactics you can ship tomorrow
Tooling, workflows, and architecture notes
One short email each weekday

arrow_back

PREVIOUS_DATA_LOG

GitHub Copilot app brings Agent Merge to automate CI fixes and PR merges

Initialize_Return_to_Core

LINK_STATUS: 127.0.0.1 (SECURE)

NEXT_DATA_LOG

OlmoEarth v1.1 cuts geospatial inference compute by up to 3x without hurting accuracy

arrow_forward