Google launched Gemini 3.1 Flash-Lite on March 3 — and the name tells you everything. It’s the fastest and cheapest model in the entire Gemini 3 family. But the interesting part isn’t the price — it’s a feature I haven’t seen in any other model.
Configurable thinking depth
Flash-Lite lets you choose how much the model should think: minimal, low, medium, or high. Sounds trivial at first, but think about it: for a simple translation, you don’t need deep reasoning. For a complex code analysis, you do. Until now, you had to switch between different models for that. Flash-Lite makes it adjustable within a single model.
The numbers
- Price: $0.25 per million input tokens, $1.50 per million output tokens
- Speed: 2.5x faster time-to-first-token compared to its predecessor
- Output speed: 45% faster than Gemini 2.5 Flash
- Quality: On par with Gemini 2.5 Flash — at a fraction of the cost
The model is available as a preview through the Gemini API in Google AI Studio and for enterprise customers via Vertex AI.
What’s it good for?
Google positions Flash-Lite for high-volume workloads: translations, content moderation, UI generation, simulations. Basically anywhere you have lots of requests and cost per request matters.
My take
The trend is clear: the biggest model doesn’t win — the best-fitting one does. Google built a model with Flash-Lite that’s perfectly adequate for 90% of everyday tasks — and costs almost nothing. The configurable thinking depth is a clever feature that I hope other providers will adopt soon.
For developers who want to integrate AI into their products, this is a real selling point. Not everyone needs a flagship model.
Sources: