DeepSeek has released the Deep-V4 series, consisting of two preview models: DeepSeek-V4-Pro and DeepSeek-V4-Flash. These Mixture of Experts (MoE) models feature a 1 million token context window and are designed for...

DeepSeek V4—almost on the frontier

Summary

Key Points

DeepSeek-V4-Pro contains 1.6T total parameters with 49B active parameters.
DeepSeek-V4-Flash contains 284B total parameters with 13B active parameters.
Both models support a 1 million token context window.
The models are released under the MIT license.
DeepSeek-V4-Flash pricing is $0.14/M input tokens and $0.28/M output tokens.
DeepSeek-V4-Pro pricing is $1.74/M input tokens and $3.48/M output tokens.

Technical Details

The V4 series implements significant optimizations for long-context processing. At a 1M-token context, DeepSeek-V4-Pro utilizes only 27% of the single-token FLOPs and 10% of the KV cache size compared to DeepSeek-V3.2. DeepSeek-V4-Flash achieves even greater efficiency, using 10% of the single-token FLOPs and 7% of the KV cache size relative to V3.2.

The Pro model's weights occupy approximately 865GB on Hugging Face, while the Flash model occupies 160GB. In terms of reasoning benchmarks, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro, though it currently trails GPT-5.4 and Gemini-3.1-Pro by an estimated 3 to 6 months.

Impact / Why It Matters

The V4 series provides a highly cost-effective alternative for large-scale, long-context inference, with the Flash model being the most inexpensive small-model option available. The open-weights release also enables potential local deployment on high-memory hardware via quantized versions.

DeepSeek V4—almost on the frontier

DeepSeek V4—almost on the frontier

Summary

Key Points

Technical Details

Impact / Why It Matters

↳ Sources