Model Routing: The Cost-Cutting Trick That Should Worry OpenAI and Anthropic

For two years, the playbook was simple: take the strongest model and send everything through it. Whether it was a basic summary or a complex code analysis — everything went to GPT-5.5 or Claude Opus. Convenient, but expensive. Now the CFOs are pulling the plug.

What model routing is

Model routing is essentially an intelligent dispatcher: simple tasks go to cheap, fast models — often open-source models from China or smaller variants like Haiku. Only the truly demanding tasks land on expensive frontier models. The result: same quality, a fraction of the cost.

Why OpenAI and Anthropic should worry

If companies only send the hardest 20 percent of tasks to premium models, token volumes collapse. And with them, revenues. OpenAI’s and Anthropic’s valuations — $850 billion and $965 billion respectively — assume endless growing demand at premium prices. Model routing challenges exactly that assumption.

The numbers are dramatic

The overspending problem is severe: some companies reported blowing through their entire 2026 token budget by April. Uber froze its developers’ Claude Code licenses after burning through the budget in four months. Microsoft revoked its own developers’ Claude Code licenses months after enabling them.

The response is model routing. Pricing power is shifting — away from the AI labs, toward the buyers.

My take

This is a natural maturation process. In the early phase of any technology, everything gets tried regardless of cost. Now comes the phase where efficiency matters. For Anthropic and OpenAI, that means they don’t just need to build the best models — they need a pricing strategy that survives model routing. Claude Code’s new fallback model feature in version 2.1.166 is a step in exactly that direction.

Sources: CNBC, TechCrunch