2 min read AI-generated

Cursor Composer 2.5: Matching Opus 4.7 at One-Tenth the Cost

Copy article as Markdown

Cursor just shipped Composer 2.5, a coding model that matches Claude Opus 4.7 and GPT-5.5 on benchmarks — at a fraction of the price.

Featured image for "Cursor Composer 2.5: Matching Opus 4.7 at One-Tenth the Cost"

The AI coding landscape has a new contender: Cursor shipped Composer 2.5, a purpose-built model that matches Claude Opus 4.7 and GPT-5.5 on key benchmarks — at one-tenth the cost.

What Composer 2.5 Can Do

Composer 2.5 is built on Moonshot’s Kimi K2.5 and was trained with 25x more synthetic training tasks than its predecessor. The results speak for themselves: 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1.

Those might sound like dry numbers, but the message is clear: a specialized coding model can compete with the big frontier models when it’s trained specifically for the task.

The Price Makes the Difference

At $0.50 per million input tokens and $2.50 per million output tokens, Composer 2.5 sits well below the API prices of Claude Opus 4.7 or GPT-5.5. For developers who generate a lot of code, that adds up fast.

What Cursor Does Differently

Instead of taking a general-purpose model and pointing it at code, Cursor went the other way: take an open-source base model and optimize it for coding with reinforcement learning. That includes more complex RL environments, new learning methods, and targeted textual feedback for better credit assignment.

The result is a model that stays more reliable during long coding sessions, follows complex instructions better, and communicates more clearly about what it’s doing.

What This Means for Claude Code

For Anthropic, Composer 2.5 is a signal. Claude Code uses Claude models as its backend, and it works extremely well. But when a specialized model delivers similar results at a fraction of the cost, the cost-benefit equation shifts.

At the same time, it shows the AI coding tools market continuing to fragment. Cursor, Claude Code, GitHub Copilot, OpenAI’s Codex — everyone is building their own stack. And the models are becoming increasingly interchangeable.

For us developers, that’s mostly one thing: good. More competition means better tools and lower prices.

Sources: