In the shadow of Anthropic’s and OpenAI’s billion-dollar rounds, a smaller player is working on a problem that affects everyone: how do you make AI faster and cheaper?
What Makes Groq Different
Groq builds its own chips — called LPUs (Language Processing Units) — specifically optimized for inference. Not for training, but for the moment when a model actually responds. And in that domain, Groq is fast. Very fast.
The company has positioned itself as an inference neocloud: customers can access models like Llama and Mixtral through Groq’s API and get responses in a fraction of the time conventional GPU-based systems need.
The Funding
According to TechCrunch, Groq is in talks for a $650 million funding round. Existing investors Disruptive and Infinitium have committed to filling the round if other investors pass.
The timing is no coincidence. Nvidia recently closed a $20 billion deal with Groq — not an acquisition, but a strategic partnership that shows even Nvidia recognizes the value of specialized inference hardware.
Why Inference Is the Bottleneck
Training gets the headlines. But inference is the actual business. Every conversation with Claude, every ChatGPT query, every Gemini Spark task — that’s all inference. And the more AI agents work in the background, the more important the question becomes: how many tokens per second can your system handle?
Groq is betting that the answer to that question doesn’t have to be ‘more Nvidia GPUs.’ Whether this bet pays off depends on whether enough customers are willing to leave the Nvidia ecosystem.
Sources: TechCrunch