Speech & Audio

Qualcomm announces AI accelerators and racks they’ll run in • The Register

Qualcomm announces AI accelerators and racks they’ll run in • The Register


Qualcomm has announced some details of its tilt at the AI datacenter market by revealing a pair of accelerators and rack scale systems to house them, all focused on inferencing workloads.

The company offered scant technical details about its new AI200 and AI250 “chip-based accelerator cards”, saying only that the AI 200 supports 768 GB of LPDDR memory per card, and the AI250 will offer “innovative memory architecture based on near-memory computing” and represent “a generational leap in efficiency and performance for AI inference workloads by delivering greater than 10x higher effective memory bandwidth and much lower power consumption.”

Qualcomm will ship the cards in pre-configured racks that will use “direct liquid cooling for thermal efficiency, PCIe for scale up, Ethernet for scale out, confidential computing for secure AI workloads, and a rack-level power consumption of 160 kW.”

In May, Qualcomm CEO Cristiano Amon offered somewhat cryptic statements that the company would only enter the AI datacenter market with “something unique and disruptive” and would use its expertise building CPUs to “think about clusters of inference that is about high performance at very low power.”

However, the house of the Snapdragon’s announcement makes no mention of CPUs. It does say its accelerators build on Qualcomm’s “NPU technology leadership” – surely a nod to the Hexagon-branded neural processing units it builds into processors for laptops and mobile devices.

Qualcomm’s most recent Hexagon NPU, which it baked into the Snapdragon 8 Elite SoC, includes 12 scalar accelerators and eight vector accelerators, and supports INT2, INT4, INT8, INT16, FP8, FP16 precisions.

Perhaps the most eloquent clue in Qualcomm’s announcement is that its new AI products “offer rack-scale performance and superior memory capacity for fast generative AI inference at high performance per dollar per watt” and has “low total cost of ownership.”

That verbiage addresses three pain points for AI operators.

One is the cost of energy to power AI applications. Another is that high energy consumption produces a lot of heat, meaning datacenters need more cooling infrastructure – which also consumes energy and impacts cost.

The third is the quantity of memory available to accelerators, a factor that determines what models they can run – or how many models can run in a single accelerator.

The 768 GB of memory Qualcomm says it’s packed into the AI 200 is comfortably mode than Nvidia or AMD offer in their flagship accelerators.

Qualcomm therefore appears to be suggesting its AI products can do more inferencing with fewer resources, a combination that will appeal to plenty of operators as (or if) adoption of AI workloads expands.

The house of Snapdragon also announced a customer for its new kit, namely Saudi AI outfit Humain, which “is targeting 200 megawatts starting in 2026 of Qualcomm AI200 and AI250 rack solutions to deliver high-performance AI inference services in the Kingdom of Saudi Arabia and globally.”

But Qualcomm says it expects the AI250 won’t be available until 2027. Humain’s announcement, like the rest of this news, is therefore hard to assess because it omits important details about exactly what Qualcomm has created and if it will be truly competitive with other accelerators.

Also absent from Qualcomm’s announcement is whether major hyperscalers have expressed any interest in its kit, or if it will be viable to run on-prem.

The announcement does, however, mark Qualcomm’s return to the datacenter after past forays focused on CPUs flopped. Investors clearly like this new move as the company’s share price popped 11 percent on Monday. ®

Qualcomm announces AI accelerators and racks they'll run in • The Register

Source link