← Back to Insights

DeepSeek Makes V4-Pro Price Cut Permanent: $0.87 Per Million Output Tokens, 34× Cheaper Than GPT-5.5

Nils Liu
DeepSeek API Pricing GenAI News

TL;DR

DeepSeek has made its 75% V4-Pro API discount permanent, pricing output tokens at $0.87 per million, 34× cheaper than GPT-5.5. This isn't just a price cut; it's a direct attack on Western AI pricing power.

DeepSeek Makes V4-Pro Price Cut Permanent: $0.87 Per Million Output Tokens, 34× Cheaper Than GPT-5.5

On May 22, DeepSeek announced it would not let its V4-Pro API discount expire.

What was supposed to be a promotional rate ending May 31 is now the permanent price. Output tokens at $0.87 per million. Input tokens at $0.435. Cache hits down to $0.003625, a 90% reduction from original pricing in some scenarios. For context: that puts DeepSeek V4-Pro at roughly 34 times cheaper than GPT-5.5 for output tokens.

What the Numbers Actually Mean

The old V4-Pro pricing: $1.74 per million input tokens, $3.48 per million output tokens. The new permanent pricing cuts both to one quarter. For engineering teams running tens of millions of API calls per day, this translates to 75-80% off their inference bill. Not a marginal saving.

V4-Pro is a Mixture-of-Experts model with 1.6 trillion total parameters, activating roughly 49 billion per inference task. It supports a 1 million-token context window. Across several benchmarks, it competes directly with GPT-5.5 and Claude Opus 4.7 at this price range.

How They Can Sustain This

The key is hardware. DeepSeek built V4-Pro natively for Huawei’s Ascend 950PR chips rather than Nvidia GPUs.

Western labs are still paying Nvidia H100/H200 prices. DeepSeek bypassed that supply chain. Counterpoint Research analyst Wei Sun noted that expanding Ascend production, targeting 2.5× last year’s shipment volume, is the structural reason DeepSeek can hold these prices. The MoE architecture compounds the advantage: each inference activates only about 3% of total parameters, which slashes per-token compute costs.

Developer Lock-In Is the Real Goal

The strategy parallels AWS in 2006. Amazon priced cloud compute at near-breakeven to capture developer workflows, then let switching costs do the rest. DeepSeek appears to be running the same playbook. Get production systems depending on the API now; raise prices or monetize differently later.

One telling detail: V4-Pro supports the Anthropic API interface format. Engineers already using Claude SDKs can point their existing code at DeepSeek with minimal changes. The migration barrier is low by design.

The Competitive Pressure

Current output pricing comparison:

ModelOutput per Million Tokens
DeepSeek V4-Pro$0.87
Gemini 3.5 Flash$9.00
GPT-5.5~$30.00
Claude Opus 4.7~$30.00

That gap is hard to argue against on cost alone. Western labs need to lean on other differentiators: data privacy, compliance, US government procurement requirements, or ecosystem lock-in. Those arguments exist, but they get harder to make as capability gaps narrow.

Anthropic has previously accused DeepSeek of distillation attacks, claiming the company used Claude outputs to train its own models. The allegation added geopolitical texture to an already tense competitive dynamic. In March, the Trump administration designated Anthropic as a “supply chain risk” and directed federal agencies to stop using Claude products. DeepSeek’s pricing move lands against that backdrop.

For developers weighing where to route workloads, the calculus shifted again this week. The question is no longer whether V4-Pro is worth testing. The question is whether the compliance and trust profile fits the use case.

If this was useful, subscribe to the newsletter for weekly AI PM insights and GenAI case studies.


Related Reading

Get the latest insights

Join the newsletter to receive my latest articles on GenAI, AI Agents, and architecture.

No spam. Unsubscribe anytime.