2 min read AI-generated

Claude Opus 4.8: Fewer Hallucinations, More Honesty

Copy article as Markdown

Anthropic's new flagship model prioritizes honesty over confidence. Opus 4.8 would rather say 'I don't know' than give you a wrong answer.

Featured image for "Claude Opus 4.8: Fewer Hallucinations, More Honesty"

Anthropic released Claude Opus 4.8 on May 28 — the latest upgrade to their flagship model line. The most interesting part isn’t what it does better. It’s what it stopped doing: pretending to know things it doesn’t.

What’s new?

Opus 4.8 achieved the lowest incorrect-answer rate of any model tested in benchmarks. Not because it answers more questions correctly, but because it abstains when it’s uncertain. Sounds simple, but it’s a genuinely useful feature.

On top of that, there are improvements in coding, agentic tasks, and professional work. The model can now launch Dynamic Workflows — autonomously breaking tasks into many sub-agents and orchestrating them. For larger projects, this is going to be a game changer.

Claude Code 2.1.154 ships alongside

Together with Opus 4.8, Anthropic also released Claude Code 2.1.154. The update makes Opus 4.8 the default model (with high-effort as the default setting), brings Dynamic Workflows to Claude Code, and improves Fast Mode. Plus better plugin management, more stable background sessions, and a long list of bug fixes.

The technical details

Pricing stays the same: $5 per million input tokens, $25 per million output tokens. The context window remains at 1,000,000 tokens, max output at 128,000 tokens. Training data cutoff is January 2026.

One interesting addition: Opus 4.8 accepts system messages mid-conversation — not just at the beginning. This makes it easier to adjust instructions during long conversations without restating the entire system prompt.

My take

The direction is right. Instead of just getting “smarter,” Anthropic is making Claude more honest. A model that knows its own limitations is more valuable in everyday work than one that always has an answer — even when it’s wrong.

Dynamic Workflows are the other highlight. When Claude can autonomously coordinate dozens of agents, it shifts the boundary of what you can expect from a single prompt. That’s going to be interesting to watch.


Sources: