Generative AI functions that use textual content, pc code, protein chains, summaries, video and even 3D graphics require data-center-scale accelerated computing to effectively practice the big language fashions (LLMs) that energy them.
In MLPerf Coaching 4.1 {industry} benchmarks, the NVIDIA Blackwell platform delivered spectacular outcomes on workloads throughout all checks — and as much as 2.2x extra efficiency per GPU on LLM benchmarks, together with Llama 2 70B fine-tuning and GPT-3 175B pretraining.
As well as, NVIDIA’s submissions on the NVIDIA Hopper platform continued to carry at-scale information on all benchmarks, together with a submission with 11,616 Hopper GPUs on the GPT-3 175B benchmark.
Leaps and Bounds With Blackwell
The primary Blackwell coaching submission to the MLCommons Consortium — which creates standardized, unbiased and rigorously peer-reviewed testing for {industry} members — highlights how the structure is advancing generative AI coaching efficiency.
As an illustration, the structure consists of new kernels that make extra environment friendly use of Tensor Cores. Kernels are optimized, purpose-built math operations like matrix-multiplies which can be on the coronary heart of many deep studying algorithms.
Blackwell’s larger per-GPU compute throughput and considerably bigger and quicker high-bandwidth reminiscence permits it to run the GPT-3 175B benchmark on fewer GPUs whereas reaching glorious per-GPU efficiency.
Making the most of bigger, higher-bandwidth HBM3e reminiscence, simply 64 Blackwell GPUs had been in a position to run within the GPT-3 LLM benchmark with out compromising per-GPU efficiency. The identical benchmark run utilizing Hopper wanted 256 GPUs.
The Blackwell coaching outcomes observe an earlier submission to MLPerf Inference 4.1, the place Blackwell delivered as much as 4x extra LLM inference efficiency versus the Hopper era. Making the most of the Blackwell structure’s FP4 precision, together with the NVIDIA QUASAR Quantization System, the submission revealed highly effective efficiency whereas assembly the benchmark’s accuracy necessities.
Relentless Optimization
NVIDIA platforms bear steady software program growth, racking up efficiency and have enhancements in coaching and inference for all kinds of frameworks, fashions and functions.
On this spherical of MLPerf coaching submissions, Hopper delivered a 1.3x enchancment on GPT-3 175B per-GPU coaching efficiency because the introduction of the benchmark.
NVIDIA additionally submitted large-scale outcomes on the GPT-3 175B benchmark utilizing 11,616 Hopper GPUs linked with NVIDIA NVLink and NVSwitch high-bandwidth GPU-to-GPU communication and NVIDIA Quantum-2 InfiniBand networking.
NVIDIA Hopper GPUs have greater than tripled scale and efficiency on the GPT-3 175B benchmark since final 12 months. As well as, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA elevated efficiency by 26% utilizing the identical variety of Hopper GPUs, reflecting continued software program enhancements.
NVIDIA’s ongoing work on optimizing its accelerated computing platforms permits continued enhancements in MLPerf check outcomes — driving efficiency up in containerized software program, bringing extra highly effective computing to companions and prospects on current platforms and delivering extra return on their platform funding.
Partnering Up
NVIDIA companions, together with system makers and cloud service suppliers like ASUSTek, Azure, Cisco, Dell, Fujitsu, Giga Computing, Lambda Labs, Lenovo, Oracle Cloud, Quanta Cloud Expertise and Supermicro additionally submitted spectacular outcomes to MLPerf on this newest spherical.
A founding member of MLCommons, NVIDIA sees the function of industry-standard benchmarks and benchmarking finest practices in AI computing as important. With entry to peer-reviewed, streamlined comparisons of AI and HPC platforms, corporations can preserve tempo with the most recent AI computing improvements and entry essential information that may assist information necessary platform funding selections.
Be taught extra concerning the newest MLPerf outcomes on the NVIDIA Technical Weblog.