DiffusionGemma – Google Lets Text Emerge Like an Image

On June 10, Google DeepMind released a model that works differently from almost everything we know about language models: DiffusionGemma. Instead of generating text token by token, left to right, it works like an image diffusion model – and that’s more than a technical curiosity.

How noise becomes text

Classic language models write one word, then the next, then the next. DiffusionGemma does the opposite: it starts with a block of 256 random placeholder tokens and refines them across several passes until readable text emerges – exactly the way an image model gradually carves a picture out of noise.

The trick: those 256 tokens are produced in parallel in a single forward pass, not one after another. That makes the model blazing fast. On a single H100 it manages over 1,000 tokens per second, on a GeForce RTX 5090 still more than 700. Up to four times faster than regular Gemma 4. And because it sharpens its own output across the passes, it can self-correct as it goes.

Technically it’s a mixture-of-experts model with 26 billion parameters, of which only 3.8 billion are active per step. Released under an Apache 2.0 license, available on Hugging Face, Kaggle and in Google’s Vertex AI Model Garden.

The catch

Speed comes at a price. DiffusionGemma scores lower than regular Gemma 4 on established benchmarks like MMLU and on coding tasks. Google says so plainly itself: this is experimental. For production use cases where quality matters, Google still recommends Gemma 4.

My take: I love this kind of thing. For years every major model has generated text the same way – token by token, front to back. DiffusionGemma questions that basic assumption. Yes, the quality doesn’t match the best models yet. But speed is a killer feature for agents that think in loops and churn out thousands of tokens at a stretch. And the fact that Google ships it openly under Apache 2.0 means the whole community can start experimenting with it now. The next standard approach often grows out of exactly these kinds of experiments.

Sources: VentureBeat: DiffusionGemma, SiliconANGLE, Hugging Face: google/diffusiongemma-26B-A4B-it