2 min read AI-generated

1 Million Tokens, No Surcharge — Claude Makes Long Context the Default

Copy article as Markdown

Anthropic drops the long-context premium entirely. 1M tokens now cost exactly the same per token as 9,000. Here is what it means for the API.

Featured image for "1 Million Tokens, No Surcharge — Claude Makes Long Context the Default"

Remember when Anthropic announced the million-token limit for Opus 4.6 back in February? Impressive — but there was a catch. Going beyond 200,000 tokens meant paying a surcharge: 2x on input, 1.5x on output. For many use cases, that was a dealbreaker.

That’s over now.

What Changed

As of March 13, the 1M token context window for Claude Opus 4.6 and Sonnet 4.6 is GA — generally available, no beta header required, no premium pricing. A 900,000-token request costs exactly the same per token as a 9,000-token one.

The numbers: Opus 4.6 stays at $5 per million input tokens and $25 per million output tokens. Sonnet 4.6 at $3 and $15. Regardless of how much context you push in.

On top of that, Anthropic raised the media limit from 100 to 600 images or PDF pages per request. For anyone working with large documents, that’s a noticeable improvement.

Who Benefits

On the API side, you no longer need the anthropic-beta header for requests over 200K tokens — it just works. Max, Team, and Enterprise users get the full 1M context in both the Claude app and Claude Code.

This is particularly exciting for use cases like whole-repository code analysis, legal document review, or processing lengthy research papers. Previously, you either had to chunk your input or swallow the surcharge. Neither is necessary anymore.

Benchmark: Opus 4.6 Hits 78.3% on MRCR v2

Anthropic points to a score of 78.3 percent on the MRCR v2 benchmark — a test that measures recall and reasoning at maximum context. This shows the million tokens aren’t just marketing — the model can actually work with them effectively.

My Take

Dropping the surcharge is one of those quiet but important changes. No developer enjoys calculating with two different price tables depending on context length. Anthropic simplifying this makes the API more predictable — and makes 1M tokens genuinely practical for everyday use. Well played.

Sources: