Qualcomm is entering the AI accelerator market with a focus on inferencing in a move that will give enterprises more options beyond Nvidia, AMD and custom hyperscale cloud processors.
The company, which has been eyeing data center workloads, launched the Qualcomm AI200 and AI250 AI accelerators with what it calls "industry-leading total cost of ownership (TCO)."
Qualcomm also said its AI chips will be compatible with AI frameworks, software and offer a new memory architecture designed for AI workloads. Qualcomm said it will have an annual cadence for its AI inference roadmap.
According to Qualcomm, it will offer its AI200 and AI250 accelerator cards and racks. The effort builds off its neural processing unit (NPU) products and aims to deliver high performance per dollar watt metrics. Humain, a Saudi Arabia AI company, will be the first customer of Qualcomm's AI accelerators. The companies will integrate Humain's AI Allam models with Qualcomm's platform.

The AI200 and AI250 will be commercially available in 2026 and 2027.
Holger Mueller, an analyst at Constellation Research, said:
"Qualcomm has proven its expertise in chip performance over and over, more recently with the launch of its new Snapdragon series - and it certainly has a right to play in the up and coming AI inference marketg. Good to see the partnerhship with Humain, but breaking into the cloud data center is (very) hard - as we have seen from Dell, HPE and before the Microsoft partnership - for Nvidia."
Here's a look at the key points:
- Qualcomm AI200 is optimized for model performance, inference and AI workloads. The rack system supports 768 GB of LPDDR per card for higher memory capacity and lower cost.
- Qualcomm AI250 features a memory architecture based on near-memory computing and can deliver 10x higher effective memory bandwidth ad lower power consumption. Qualcomm said it is pushing toward disaggregated AI inferencing.
- Both racks have direct liquid cooling, PCIe, Ethernet and confidential computing.
- Rack-level power consumption is 160 kw.
- The stack supports leading machine learning frameworks, inference engines, generative AI frameworks, and LLM / LMM inference optimization techniques like disaggregated serving.
- The systems provide seamless model onboarding and one-click deployment of Hugging Face models via Qualcomm Technologies’ Efficient Transformers Library and Qualcomm AI Inference Suite.
More:
