Advancements in NVIDIA Platforms: Leading Performance in AI and Data Center Benchmarks

Advertisement

As enterprises rush to adopt generative AI and bring new services to the market, the demands on data center infrastructure have surged. Training large language models (LLMs) presents one challenge, but delivering LLM-powered real-time services is another level of complexity. In the latest MLPerf industry benchmarks for inference v4.1, NVIDIA platforms showcased leading performance across all data center tests.

NVIDIA’s Benchmark Achievements

The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf’s largest LLM workload, LLaMA 2 70B. This remarkable achievement is attributed to the second-generation transformer engine and FP4 Tensor Cores. Additionally, the NVIDIA H200 Tensor Core GPU delivered outstanding results on every data center benchmark, including the Mixtral 8x7B mixture of experts (MoE) LLM, which features 46.7 billion parameters, with 12.9 billion parameters active per token.

The Growing Need for Compute Power

As the deployment of LLMs grows, so does the necessity for robust compute power to handle inference requests. To meet these demands for real-time latency and high user capacity, multi-GPU compute systems are essential. NVIDIA NVLink and NVSwitch provide high-bandwidth communication between GPUs based on the NVIDIA Hopper architecture, offering significant benefits for real-time and cost-effective large model inference. The Blackwell platform will further enhance NVLink switch capabilities, accommodating 72 GPUs in larger NVLink domains.

Software Innovations and Performance Gains

NVIDIA platforms are continuously enhanced through relentless software development. In the most recent inference round, NVIDIA offerings—including the NVIDIA Hopper architecture, NVIDIA Jetson platform, and NVIDIA Triton Inference Server—showed substantial performance improvements. For instance, the H200 GPU delivered up to 27% more generative AI inference performance compared to the previous round, providing customers with ongoing value.

Edge Deployments and Future Prospects

Generative AI models deployed at the edge can turn sensor data, such as images and videos, into real-time, actionable insights. The NVIDIA Jetson platform for edge AI and robotics can run a diverse range of models locally. In the latest MLPerf benchmarks, the NVIDIA Jetson AGX Orin system-on-modules achieved over 6.2x throughput improvement and 2.4x latency improvement on the GPT-J LLM workload.

Conclusion

This round of MLPerf inference has demonstrated NVIDIA’s versatile and leading performance across all benchmarks, from data centers to the edge. To learn more about these results, visit the technical blog. H200 GPU-powered systems are currently available from CoreWeave and server makers like ASUS, Dell Technologies, HPE, and Supermicro.

Advertisement
Karol J. Jones
Karol J. Jones
4993 Laurel Lee Kansas City, MO 64106

Similar Articles

Comments

Advertismentspot_img

Instagram

Most Popular