DeepSeek dropped its V4 model series on April 24 — and the pricing is so low you have to do a double-take. Two models are available: DeepSeek-V4-Pro with 1.6 trillion total parameters (49 billion active) and DeepSeek-V4-Flash with 284 billion parameters (13 billion active). Both under MIT license, both with a million-token context window.
The pricing speaks for itself
Simon Willison put together a comparison table that makes the picture crystal clear. DeepSeek-V4-Flash costs $0.14 per million input tokens — that’s cheaper than GPT-5.4 Nano and Gemini 3.1 Flash-Lite. V4-Pro comes in at $1.74 per million input tokens, well below Claude Sonnet 4.6 ($3) or GPT-5.4 ($2.50).
How do they pull this off?
DeepSeek invested heavily in efficiency. According to their paper, V4-Pro uses just 27% of the compute of DeepSeek V3.2 at a million-token context — while performing better. Flash manages only 10%. That explains the pricing.
The architecture uses a ‘Hybrid Attention’ system that’s particularly efficient with long contexts compared to classic transformer architectures. The KV cache — one of the biggest cost drivers for long prompts — has been reduced to 7-10% of its predecessor.
How good are the models, really?
DeepSeek is honest about where they stand: V4-Pro is competitive with frontier models but not quite at the level of GPT-5.4 or Gemini 3.1-Pro. Their paper describes a gap of ‘approximately 3 to 6 months.’ That’s still impressive for an open-source model.
The models are available as open weights on Hugging Face — Pro at 865 GB, Flash at 160 GB. The community is already waiting for quantized versions from the Unsloth team so Flash can run on consumer hardware.
Why this matters
DeepSeek V4 is the strongest evidence yet that open-source models are closing the gap with proprietary frontier models — at a fraction of the cost. For developers looking to optimize API costs, V4-Flash is an extremely attractive option. For anyone who wants to run models locally, the quantized Flash version could be a game-changer.
Sources: