Nvidia launched its Nvidia H200 GPU, which will offer faster memory and more bandwidth for generative AI workloads. The H200 will ship in the second quarter of 2024.
The launch comes as Wall Street waits for Nvidia's earnings and a read on whether the company could meet demand. In addition, Nvidia is about to see competition from AMD and hyperscale cloud players have their own proprietary chips for model training.
Nvidia's H200 is the first to offer HBM3e, which gives the H200 the ability to deliver 141GB of memory at 4.8 terabytes per second. That tally is a big jump in capacity and bandwidth relative to the Nvidia A100. The H200 serves as the base for the Nvidia HGX H200 AI computing platform based on the company's Hopper architecture.
Related:
- Microsoft uses Oracle Cloud Infrastructure for Bing conversational workloads
- Why generative AI workloads will be distributed locally
- Enterprises seeing savings, productivity gains from generative AI
- AMD sees AI interest, AMD Instinct MI300A and MI300X GPUs on track
- AMD makes its case for generative AI workloads vs. Nvidia
- Why enterprises will want Nvidia competition soon
According to Nvidia, H200 will nearly double the inference speed on the Llama 2 large language model. Nvidia said software updates will boost performance more.
The latest Nvidia GPU will be available across server makers such as ASRock Rack, ASUS, Dell Technologies, Eviden, GIGABYTE, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT, Supermicro, Wistron and Wiwynn.
Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will also deploy H200-based instances as will CoreWeave, Lambda and Vultr.
One thing to watch going forward will be the supply of Nvidia GPUs and whether it can meet demand. Also watch what vendors get allocations of GPUs relative to others.
For instance, Super Micro CEO Charles Liang said on the company's fiscal first quarter earnings call.
"We navigated tight AI GPU and key components supply conditions to deliver total solutions and large compute clusters, especially for generative AI workloads where our backorders continue to expand faster than our forecast.
During the first quarter, demand for our leading AI platforms in plug-and-play rack-scale, especially for the LLM-optimized NVIDIA HGX-H100 solutions, was the primary growth driver.
Many customers have started to request direct-attached cold-plate liquid-cooling solutions to address the energy costs, power grid constraints and thermal challenges of these new GPU infrastructures."