DeepSeek announced major price reductions for its API service on Saturday, reducing input cache hit fees to one-tenth of their original price and offering a 75% limited-time discount on its flagship V4-Pro model through May 5.
The new promotional price for V4-Pro’s input cache hit is now 0.025 yuan, approximately $0.0036, per million tokens. The model’s standard pricing during this promotional period is set at 3 yuan for input and 6 yuan for output per million tokens. These rates significantly undercut Western competitors, whose output prices range from $12 to $25 per million tokens, according to OpenRouter data.
DeepSeek released the V4-Pro and V4-Flash models in a preview capacity on April 24, marking the company’s first significant model launch since last December. V4-Pro features 1.6 trillion parameters, with 49 billion active parameters per inference pass, positioning it as the largest open-weight model currently available. In contrast, V4-Flash offers a smaller option with 284 billion parameters.
Prior to these discounts, V4-Pro’s standard pricing of $1.74 for input and $3.48 for output per million tokens was already approximately 98% less than OpenAI’s GPT-5.5 Pro rates. The latest cuts further widen this price differential.
DeepSeek’s pricing strategy is a response to rising computing power costs within the AI sector. Wei Sun, principal AI analyst at Counterpoint Research told CNN that the company has implemented “the concept of ‘AI price reduction'” amidst overall increases in industry costs.
DeepSeek’s V4 models operate on Huawei Ascend hardware rather than Nvidia chips. This transition is viewed as important for AI system development and deployment due to reduced dependency on Nvidia technology. Sun emphasized that this shift may accelerate domestic adoption and global AI advancements.
Additionally, V4-Pro requires only 27% of the computing power needed by its predecessor, V3.2, for a one-million-token context window. DeepSeek has acknowledged that V4 models currently lag behind frontier models like GPT-5.4 and Gemini 3.1 Pro by about three to six months in performance capabilities.





