DeepSeek V4 - almost on the frontier, a fraction of the price
Summary
DeepSeek has released the DeepSeek-V4 series, featuring the DeepSeek-V4-Pro and DeepSeek-V4-Flash models. These Mixture of Experts (MoE) models provide a 1 million token context window at a significantly lower price point than competing frontier models such as GPT-5.4 and Gemini 3.1.
Key Points
- DeepSeek-V4-Pro features 1.6T total parameters with 49B active parameters.
- DeepSeek-V4-Flash features 284B total parameters with 13B active parameters.
- Both models support a 1 million token context window.
- DeepSeek-V4-Flash pricing is set at $0.14/M tokens input and $0.28/M tokens output.
- DeepSeek-V4-Pro pricing is set at $1.74/M tokens input and $3.48/M tokens output.
- The models are released under the MIT license.
Technical Details
The V4 series utilizes a Mixture of Experts (MoE) architecture optimized for long-context efficiency. In a 1M-token context scenario, DeepSeek-V4-Pro achieves 27% of the single-token FLOPs and 10% of the KV cache size relative to DeepSeek-V3.2. DeepSeek-V4-Flash demonstrates even greater efficiency, utilizing only 10% of the single-token FLOPs and 7% of the KV cache size compared to V3.2.
In terms of deployment scale, the Pro variant requires approximately 865GB on Hugging Face, while the Flash variant is 160GB. While DeepSeek-V4-Pro-Max demonstrates superior performance on reasoning benchmarks relative to GPT-5.2 and Gemini-3.0-Pro, it currently trails GPT-5.4 and Gemini-3.1-Pro by an estimated 3 to 6 months in developmental trajectory.
Impact / Why It Matters
The significant reduction in inference costs provides a highly cost-effective alternative for developers building large-scale applications requiring long context windows. For self-hosters, the 160GB Flash model presents an opportunity for local deployment on high-memory hardware via quantized versions.