Giao diện
TeguNews
Công nghệ

DeepSeek V4 and Huawei Ascend 950DT: The Co-Designed Chip That Cut AI Inference Costs by 75%

A groundbreaking trace-level analysis by Wall Street research firm SemiAnalysis has revealed that DeepSeek V4 and Huawei Ascend 950DT AI accelerator were co-des...

Pandaily (China Startup/AI)1 phút đọc

DeepSeek V4 and Huawei Ascend 950DT: The Co-Designed Chip That Cut AI Inference Costs by 75%

A groundbreaking trace-level analysis by Wall Street research firm SemiAnalysis has revealed that DeepSeek V4 and Huawei Ascend 950DT AI accelerator were co-designed from the ground up, overturning the prevailing assumption that the model was merely adapted to run on domestic Chinese chips after development. The revelation explains how DeepSeek was able to slash its V4-Pro API pricing by 75 percent, bringing million-token inference costs to just 0.20 yuan, roughly 50 times cheaper than competing Anthropic offerings, while maintaining operational profitability.

SemiAnalysis found that Huawei CANN 8.5 software stack and the Ascend 950DT unique dual-die UMA architecture with four specialized execution units were built with DeepSeek inference patterns in mind from the very beginning. The 950DT, codenamed David in Huawei CANN source code, features HiZQ 2.

0 memory delivering 144GB capacity and 4TB per second bandwidth, with dual-die unified memory access. Its MC-squared technology merges communication primitives and compute into single kernels, eliminating the data transfer bottleneck that traditionally limits AI inference performance on competitor platforms. The results are stark: only CUDA and CANN had full day-zero support for DeepSeek V4 inference.

AMD ROCm managed just one to two tokens per second, while NVIDIA TRT-LLM suffered from a silent memory corruption bug that required weeks to diagnose. DeepSeek token traffic share surged from under one percent to 17 percent in May 2026, surpassing OpenAI for third place on Vercel AI Gateway metrics. This growth reflects the combined impact of dramatically lower pricing and reliable inference infrastructure.

ByteDance has locked in half of Ascend 950 production capacity, with Alibaba and Tencent also ordering tens of thousands of units. China Mobile purchased 776 Ascend node sets totaling 6,208 accelerators. The Ascend 950DT is scheduled for August 2026 cloud deployment on Huawei Cloud, signaling an irreversible shift from N

Đọc thêm từ Công nghệ