AI DESK · HONG KONG · WEEKLY

Ascend 910B Cannot Clear the Agentic Floor

Google's agent era declaration is backed by B200 NVL72 inference clusters that BIS October 2023 controls have already placed beyond PRC labs' procurement reach.
AN

The Infrastructure Turn

Google's I/O declaration this week named the agentic era, and the declaration is backed by hardware. 5, and the "anything-to-anything" architecture each runs at inference densities that B200 NVL72 clusters sustain and H100 arrays only approximate. 8 billion to AI data centers, per Wired, and that figure is inference real estate at the same class of commitment Microsoft Azure made for OpenAI. The three major US labs each now have a cloud partner running inference at a scale that turns agent deployment from a conference demonstration into an enterprise product line. OpenAI's result cracking an 80-year combinatorics problem this week runs on the same cluster architecture that produces the API revenue that funds the next training run. The commercial loop closes on itself. A Hsinchu-based TSMC packaging engineer reading the next CoWoS-S capacity allocation knows what the supply curve looks like through 2027. A PRC cloud procurement officer reads the same allocation and files a request that BIS October 2023 controls will not permit.

The Ascend Ceiling Holds

The PRC side runs on Ascend 910B clusters, Huawei's best commercially available inference hardware, produced at SMIC, China's leading domestic chipmaker, on a 7nm process node, and the successor Ascend 910C offers partial training improvements without resolving the memory bandwidth constraint that enterprise-scale agentic inference requires. Baidu's ERNIE series, Zhipu's GLM series, and ByteDance's Doubao each run inference on Ascend 910B or domestically sourced equivalents with comparable throughput ceilings. BIS, the Commerce Department bureau that controls sensitive technology exports, placed B200-class hardware on restricted-entity provisions in October 2023 that PRC cloud providers cannot route through Singapore or Malaysian colocation facilities without triggering enforcement action. PRC labs have not fallen behind on model performance benchmarks. Baidu and Zhipu products have performed within range of their US equivalents on several academic evaluations. Enterprise agentic deployment requires inference density that sustains continuous multi-step reasoning across thousands of concurrent user sessions. That is a token throughput and memory bandwidth floor. Ascend 910B does not clear that floor. A PRC enterprise software buyer signing an agentic contract this quarter is committing to a throughput ceiling the vendor's sales deck does not name.

Baidu's Q2 2026 earnings call is scheduled for August. ByteDance does not report publicly. Those two capacity projections—session volumes, concurrent-user ceilings, Ascend 910B utilization rates—are the next verifiable data points. Several enterprise pilots are already running. The contracts have been signed. Whether the session volumes that make the unit economics work can be reached on current domestic hardware is a question those disclosures will either answer or defer. August arrives before Q3 closes.

PREVIOUS COLUMNS, AI DESK DESK