Alibaba looks to end reliance on Nvidia for AI inference • The Register

Alibaba has reportedly developed an AI accelerator amid growing pressure from Beijing to curb the nation’s reliance on Nvidia GPUs.
First reported by the Wall Street Journal Friday, the ecommerce giant’s latest chip is aimed specifically at AI inference, which refers to serving models as opposed to training them.
Alibaba’s T-Heat division has been working on AI silicon for some time now. In 2019, it introduced the Hanguang 800. However, unlike modern chips from Nvidia and AMD, the part was primarily aimed at conventional machine learning models like ResNet — not the large language and diffusion models that power AI chatbots and image generators today.
The new chip, it’s reported, will be able to handle a more diverse set of workloads. Alibaba has become one of the leading developers of open models with its Qwen3 family launched in April. As such, its initial focus on inference isn’t surprising. Serving models generally requires fewer resources than training them, making it a good place to start its transition to homegrown hardware. Alibaba is likely to continue using Nvidia accelerators for model training for the foreseeable future.
The Journal also reports that, unlike Huawei’s Ascend line of NPUs, Alibaba’s chip will be compatible with Nvidia’s software platform, allowing engineers to repurpose existing code. While that might sound like CUDA — Nvidia’s low-level programming language for GPUs — this is unlikely and isn’t necessary for inference.
More likely, Alibaba is targeting higher-level abstraction layers like PyTorch or TensorFlow, which, for the most part, provide a hardware-agnostic programming interface. We say largely, because there is still plenty of PyTorch code that makes use of libraries built exclusively for Nvidia hardware, though projects like Triton have addressed many of these edge cases.
In any case, the chip will need to be built domestically due to US export controls on semiconductor tech which prevent many Chinese companies from doing business with TSMC or Samsung Electronics.
The report doesn’t say which company will be tasked with fabbing the chip, but if we had to guess, it would be China’s Semiconductor Manufacturing International Co. (SMIC). SMIC also happens to be the fab responsible for pumping out Huawei’s Ascend family of NPUs.
Manufacturing isn’t the only challenge facing China’s domestic chip efforts. AI accelerators rely on large quantities of fast memory. Typically, this means high bandwidth memory (HBM), which, thanks to the US China trade war, is also restricted. HBM2e and newer can’t be sold in China unless it’s already attached to a processor.
That means Alibaba’s chips will either rely on slower GDDR or LPDDR memory, existing stockpiles of HBM3 and HBM2e, or older HBM2, which isn’t restricted, until Chinese memory vendors are ready to fill the void.
News of the homegrown silicon comes as the Chinese government pressures tech titans in the region not to use Nvidia’s H20 accelerators while also stoking fears of backdoors and remote kill switches. Nvidia, which was recently cleared to resume shipments of H20s to China, has denied the existence of any such features.
As we reported earlier this week, while Nvidia has the green light to resume shipments of the chip, the company doesn’t expect to realize revenues in the region this quarter while it waits for Uncle Sam to find his way through the maze of red tape required to implement a 15 percent export tax on AI chips bound for China.
Nvidia’s imminent return to the Middle Kingdom hasn’t stopped many of China’s AI flag bearers from looking for alternatives. Earlier this month, DeepSeek retuned its market-rattling models to run on a new generation of domestic silicon.
DeepSeek didn’t identify the supplier of the chips, but the company has reportedly failed to transition model training to Huawei’s Ascend accelerators.
Alibaba and Huawei aren’t the only ones working to end China’s reliance on Western silicon. Last month, EE Times China reported that Tencent-backed startup Enflame was developing a new AI chip called the L600 which would feature 144GB of on-chip memory capable of 3.6Tb/s of bandwidth.
MetaX, meanwhile, has unveiled its C600 which will feature 144GB of HBM3e. However, it appears chip production may be limited by existing stockpiles of HBM3e.
Finally, Cambricon, another AI chip hopeful seen by some as China’s Nvidia, is also working on a home grown accelerator called the Siyuan 690 which is widely expected to outperform Nvidia’s now three year old H100 accelerators.
The Register reached out to Alibaba for comment; we’ll let you know if we hear anything back. ®