2 min read AI-generated

Gemini 3.1 Flash Image Goes GA – and Learns Video-to-Image

Copy article as Markdown

Google moves its image model Gemini 3.1 Flash Image (internally 'Nano Banana 2'), alongside Gemini 3 Pro Image, to general availability. The preview variants are being shut down. The fun new bit: you can now feed in a video to generate thumbnails, posters, or infographics from it.

Featured image for "Gemini 3.1 Flash Image Goes GA – and Learns Video-to-Image"

A quick look over the fence at the competition: Google has moved its image model Gemini 3.1 Flash Image — known internally by the nickname “Nano Banana 2” — to general availability (GA), together with Gemini 3 Pro Image. In exchange, the existing -preview variants are being shut down.

From preview to production

Nano Banana 2 had been in preview since earlier this year and quickly became a developer favorite — mostly because it pairs fast, high-quality image generation with conversational editing at a mainstream price. That blend of low latency and solid quality is exactly what makes it the efficient counterpart to the bigger Gemini 3 Pro Image.

With the GA step, the preview era is over. If you’re still pointing at gemini-3.1-flash-image-preview or gemini-3-pro-image-preview, migrate soon — the preview endpoints are being retired.

The real highlight: video-to-image

More interesting than the GA label, to me, is a new capability exclusive to Gemini 3.1 Flash Image: video-to-image generation. You pass a video file as multimodal context — along with a text prompt — and the model generates matching stills from it. Google lists use cases like high-quality thumbnails, cinematic posters, and summary infographics.

That’s more than a gimmick. Anyone who regularly produces video content knows the hassle of building good preview images for it. A model that actually understands the clip and derives a coherent keyframe from it saves exactly that tedious in-between step.

My take

clauding.de is mostly about Claude and Anthropic — but the pace Google is setting on image models is hard to ignore. Competition in multimodal models pushes the whole field forward, and in the end we all benefit. Video-to-image is one of those features that sounds unremarkable and can save real time in day-to-day work. I’m curious when similar multimodal bridges become standard elsewhere too.

Sources: Gemini API Release Notes, Google Cloud: Generate images from video with Gemini