DeepSeek V4—almost on the frontier
Summary
DeepSeek has released the Deep-V4 series, consisting of two preview models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. These Mixture of Experts (MoE) models feature a 1 million token context window and are designed for high-efficiency, low-cost inference.
Key Points
- DeepSeek-V4-Pro contains 1.6T total parameters with 49B active parameters.
- DeepSeek-V4-Flash contains 284B total parameters with 13B active parameters.
- Both models support a 1 million token context window.
- The models are released under the MIT license.
- DeepSeek-V4-Flash pricing is $0.14/M input tokens and $0.28/M output tokens.
- DeepSeek-V4-Pro pricing is $1.74/M input tokens and $3.48/M output tokens.
Technical Details
The V4 series implements significant optimizations for long-context processing. At a 1M-token context, DeepSeek-V4-Pro utilizes only 27% of the single-token FLOPs and 10% of the KV cache size compared to DeepSeek-V3.2. DeepSeek-V4-Flash achieves even greater efficiency, using 10% of the single-token FLOPs and 7% of the KV cache size relative to V3.2.
The Pro model's weights occupy approximately 865GB on Hugging Face, while the Flash model occupies 160GB. In terms of reasoning benchmarks, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro, though it currently trails GPT-5.4 and Gemini-3.1-Pro by an estimated 3 to 6 months.
Impact / Why It Matters
The V4 series provides a highly cost-effective alternative for large-scale, long-context inference, with the Flash model being the most inexpensive small-model option available. The open-weights release also enables potential local deployment on high-memory hardware via quantized versions.