The most interesting idea behind Sakana’s Fugu isn’t that it wants to match Fable 5. It’s how it does it. Sakana didn’t train a new frontier model from scratch. Instead, there’s a small conductor — and it decides who plays.
What an orchestration model is
Fugu (and its larger sibling, Fugu Ultra) is built around an orchestrator of just 7 billion parameters. Its only job: pick the right external model for each part of a problem and route the request there. The pool of specialized frontier models is swappable — if one drops out, another steps in.
The numbers Sakana cites are notable: 54.2 on SWE-Pro and 95.1 on GPQA-Diamond — achieved purely through smart routing, not through a single overpowering model. On certain benchmarks, Fugu draws level with Mythos without ever having trained a Mythos of its own.
David Ha’s thesis
David Ha, co-founder and CEO, summed it up on X: ‘Orchestration models are the next frontier, beyond bigger models.’ His argument: relying on a single provider for national infrastructure is a risk that recent export controls made impossible to ignore.
‘Access to top models can disappear overnight,’ Ha wrote. ‘Collective intelligence is the practical hedge against this concentration of power.’ That’s exactly what Fugu is built for: as a hedge against vendor lock-in and against geopolitical shocks.
Skepticism included
Sakana itself stays grounded. In a manual review of twelve public posts on launch day, the team counted three supportive, six skeptical, and three critical reactions. The idea excites people — but plenty want to see proof in daily use first.
My take
Conceptually this reminds me of mixture-of-experts, just one floor up: not experts inside a model, but whole models behind a single API. It’s clever, because it sidesteps the most expensive ingredient — training a frontier model — and bets on routing quality instead.
And that’s exactly the catch. The whole magic lives or dies with the conductor. If it picks wrong, the best pool in the world won’t help. An orchestrator is only as good as its judgment about who can solve the task — and that’s a hard, underrated problem.
Still: after two weeks of export chaos, the lock-in question hits a nerve. Maybe the most interesting answer to ‘which model is best?’ is about to become: ‘why just one?’
Sources: