2 min read AI-generated

Gemini 3.1 Flash-Lite — Google's Cheapest AI Model Packs a Punch

Copy article as Markdown

Google launches the cheapest model in the Gemini 3 family. One-eighth the cost of Pro, configurable thinking depth, and faster than anything before it.

Featured image for "Gemini 3.1 Flash-Lite — Google's Cheapest AI Model Packs a Punch"

Google launched Gemini 3.1 Flash-Lite on March 3 — and the name tells you everything. It’s the fastest and cheapest model in the entire Gemini 3 family. But the interesting part isn’t the price — it’s a feature I haven’t seen in any other model.

Configurable thinking depth

Flash-Lite lets you choose how much the model should think: minimal, low, medium, or high. Sounds trivial at first, but think about it: for a simple translation, you don’t need deep reasoning. For a complex code analysis, you do. Until now, you had to switch between different models for that. Flash-Lite makes it adjustable within a single model.

The numbers

  • Price: $0.25 per million input tokens, $1.50 per million output tokens
  • Speed: 2.5x faster time-to-first-token compared to its predecessor
  • Output speed: 45% faster than Gemini 2.5 Flash
  • Quality: On par with Gemini 2.5 Flash — at a fraction of the cost

The model is available as a preview through the Gemini API in Google AI Studio and for enterprise customers via Vertex AI.

What’s it good for?

Google positions Flash-Lite for high-volume workloads: translations, content moderation, UI generation, simulations. Basically anywhere you have lots of requests and cost per request matters.

My take

The trend is clear: the biggest model doesn’t win — the best-fitting one does. Google built a model with Flash-Lite that’s perfectly adequate for 90% of everyday tasks — and costs almost nothing. The configurable thinking depth is a clever feature that I hope other providers will adopt soon.

For developers who want to integrate AI into their products, this is a real selling point. Not everyone needs a flagship model.

Sources: