Sometimes a small weekend experiment shows where we are with AI tools better than any product launch. Simon Willison spotted Moebius on Hacker News — a compact inpainting model where you mark regions of an image and the model imagines what should fill the gap. The catch: it normally needed PyTorch and NVIDIA CUDA.
0.2B sounds like “this could run in a browser”
Willison paused at the name: 0.2 billion parameters — small enough to try running directly in a browser via WebGPU. Instead of grinding through it himself, he handed the job to Claude Code. And it took on the whole chain: convert the PyTorch model to ONNX, publish the result to Hugging Face, then build a web app and interface that loads and runs the model.
The remarkable part isn’t that any single step works — it’s that Opus 4.8 walks the entire path in one go. Model conversion, hosting, frontend: those are usually three different hats, and here one agent puts them on in sequence.
It runs. In every browser.
The result is a working demo that runs in Chrome, Firefox, and Safari. One finding Willison highlights: the CacheStorage API copes with model files of around 1.3 GB. In plain terms — inpainting can be a feature of a client-only web app. No server, no GPU cloud, no per-request API cost. The model loads into the browser cache once and computes locally after that.
My take
This is exactly why I read Willison’s blog. Someone starts with “that looks small, wonder if it runs in a browser?” and ends up with a client-only app running a real ML model locally. That’s the kind of unplanned discovery these tools produce constantly — you ask one thing and end up somewhere else. What I find most interesting is the implication: if an agent can convert, host, and wrap a model into a web app, the barrier to just trying small models fully offline in the browser drops dramatically. Not every model is a tidy 0.2B — but a surprising number of useful ones are.