Gemini 3.1 Flash-Lite — Google's Cheapest AI Model Packs a Punch

Google launched Gemini 3.1 Flash-Lite on March 3 — and the name tells you everything. It’s the fastest and cheapest model in the entire Gemini 3 family. But the interesting part isn’t the price — it’s a feature I haven’t seen in any other model.

Configurable thinking depth

Flash-Lite lets you choose how much the model should think: minimal, low, medium, or high. Sounds trivial at first, but think about it: for a simple translation, you don’t need deep reasoning. For a complex code analysis, you do. Until now, you had to switch between different models for that. Flash-Lite makes it adjustable within a single model.

The numbers

Price: $0.25 per million input tokens, $1.50 per million output tokens
Speed: 2.5x faster time-to-first-token compared to its predecessor
Output speed: 45% faster than Gemini 2.5 Flash
Quality: On par with Gemini 2.5 Flash — at a fraction of the cost

The model is available as a preview through the Gemini API in Google AI Studio and for enterprise customers via Vertex AI.

What’s it good for?

Google positions Flash-Lite for high-volume workloads: translations, content moderation, UI generation, simulations. Basically anywhere you have lots of requests and cost per request matters.

My take

The trend is clear: the biggest model doesn’t win — the best-fitting one does. Google built a model with Flash-Lite that’s perfectly adequate for 90% of everyday tasks — and costs almost nothing. The configurable thinking depth is a clever feature that I hope other providers will adopt soon.

For developers who want to integrate AI into their products, this is a real selling point. Not everyone needs a flagship model.

Sources: