NVIDIA H100 GPU HBM3 94GB 350W PCIE/NVL

NVIDIA H100 NVL: Revolutionizing Large Language Model Deployment

The NVIDIA H100 NVL is designed to meet the demands of deploying massive large language models (LLMs) like ChatGPT at scale, offering unmatched performance and memory capacity tailored for AI workloads.

Memory and Bandwidth for LLMs

The H100 NVL features a full 6144-bit memory interface (1024-bit per HBM3 stack) with memory speeds reaching up to 5.1 Gbps. This results in a maximum memory throughput of 7.8 TB/s, more than double that of the H100 SXM. For LLMs, which require large buffers and high bandwidth, this improvement can significantly enhance performance.

Unparalleled LLM Deployment Capabilities

Each H100 NVL card integrates 96 GB of HBM3 memory per GPU, with a total of 188 GB across its dual-GPU configuration (94 GB per GPU). With its dual-GPU NVLink interconnect, the H100 NVL is capable of processing up to 175 billion ChatGPT parameters in real time.

For large-scale deployment, a single server with four H100 NVL cards can deliver up to 10x the speed of a traditional DGX A100 server with eight GPUs, making it ideal for customers aiming to scale their LLM infrastructure rapidly.

Dual-GPU Design with NVLink

The H100 NVL introduces NVIDIA’s first dual-GPU design in years, specifically engineered for data centers and AI workloads. This setup consists of two PCIe cards connected via three NVLink Gen4 bridges, enabling seamless GPU-to-GPU communication.

While the H100 NVL doesn’t introduce new architectural features beyond the Hopper architecture’s transformer engines, its high memory capacity and NVLink connectivity make it a standout for LLM deployments.

Memory Capacity: A Competitive Edge

The H100 NVL offers 188 GB of HBM3 memory, the highest buffer size available in the Hopper lineup. This capacity ensures optimal performance for LLM inference and other memory-intensive workloads, cementing its position as the most powerful PCIe H100 variant.

Performance Highlights

  • 12x GPT-3 Inference Throughput: When compared to the last-generation HGX A100 (8 H100 NVLs vs. 8 A100s), the H100 NVL delivers 12 times the inference throughput for GPT-3-175B.

  • Hopper Architecture Advantage: Powered by transformer engines, the H100 NVL leverages Hopper’s architecture for significant performance gains in LLM tasks.

Purpose-Built for LLMs

The H100 NVL is not just another GPU; it’s a purpose-built solution for scaling AI language models. With its unmatched memory, bandwidth, and dual-GPU design, the H100 NVL is poised to redefine large-scale AI model deployment for enterprises and researchers alike.

NVIDIA H100 NVL vs. NVIDIA H100 GPU HBM3 PCI-E: Tailored for Different AI Demands

The NVIDIA H100 NVL and H100 GPU HBM3 PCI-E represent two cutting-edge options in NVIDIA's Hopper lineup, each optimized for distinct use cases in AI and high-performance computing.


NVIDIA H100 NVL: Purpose-Built for Multi-GPU AI Clusters

  • High-Speed GPU Interconnect: Equipped with NVLink Gen4, the H100 NVL enables ultra-fast GPU-to-GPU communication, making it the go-to solution for dense, multi-GPU AI clusters.

  • Memory and Bandwidth: Features 188 GB of HBM3 memory (94 GB per GPU) with a 6144-bit memory interface, delivering up to 7.8 TB/s bandwidth—more than double the throughput of the H100 SXM.

  • Large Language Model Optimization: Specifically designed for deploying massive LLMs like ChatGPT, capable of handling up to 175 billion parameters in real-time. Four H100 NVL GPUs in a single server can achieve 10x the speed of an eight-GPU DGX A100 system.

  • Power and Cooling: With a 350W TDP, the H100 NVL requires robust cooling solutions to maintain performance in high-density deployments.

  • Key Use Cases: Ideal for cutting-edge AI research, LLM deployment, and environments where NVLink’s high-speed interconnect is critical for scalability.


NVIDIA H100 GPU HBM3 PCI-E: Flexibility Meets High Performance

  • Broad Compatibility: Operates on the PCIe interface, offering compatibility with a wider range of systems and avoiding the need for NVLink setups.

  • Advanced Memory Technology: Utilizes 94 GB of HBM3 memory, combining the power of Hopper architecture with the flexibility of PCIe.

  • Balanced Power and Cooling: Rated at 350W TDP, requiring robust cooling similar to the H100 NVL, but without the added complexity of NVLink-based systems.

  • Key Use Cases: Ideal for enterprises seeking advanced AI acceleration with plug-and-play compatibility in existing infrastructure.

Contact us for pricing


Comparative Highlights

Feature

NVIDIA H100 NVL

NVIDIA H100 GPU HBM3 PCI-E

Memory Capacity

188 GB HBM3 (94 GB per GPU)

94 GB HBM3

Interface

NVLink with PCIe

PCIe

Bandwidth

Up to 7.8 TB/s

High-performance PCIe bandwidth

Power (TDP)

350W

350W

Key Advantage

Optimized for multi-GPU NVLink setups

Broader system compatibility

Primary Use Case

Large-scale AI and dense GPU clusters

Enterprise AI and flexible infrastructure

Choosing the Right GPU for Your Needs

  • H100 NVL: For environments prioritizing large-scale LLMs, high memory bandwidth, and multi-GPU setups, the NVL delivers unmatched performance and scalability.

  • H100 GPU HBM3 PCI-E: For enterprises looking for high-performance GPUs without committing to NVLink infrastructure, the PCI-E variant provides the perfect balance of power and compatibility.

Both GPUs represent the forefront of NVIDIA's Hopper architecture, empowering organizations to tackle AI and HPC challenges with efficiency and precision.

Specification 


Specification

H100 SXM

H100 PCIe

H100 NVL^2

FP64

34 teraFLOPS

26 teraFLOPS

68 teraFLOPS

FP64 Tensor Core

67 teraFLOPS

51 teraFLOPS

134 teraFLOPS

FP32

67 teraFLOPS

51 teraFLOPS

134 teraFLOPS

TF32 Tensor Core

989 teraFLOPS

756teraFLOPS

1,979 teraFLOPS’

BFLOAT16 Tensor Core

1,979 teraFLOPS

1,513 teraFLOPS

3,958 teraFLOPS

FP16 Tensor Core

1,979 teraFLOPS

1,513 teraFLOPS

3,958 teraFLOPS

FP8 Tensor Core

3,958 teraFLOPS

3,026 teraFLOPS

7,916 teraFLOPS

INT8 Tensor Core

3,958 TOPS

3,026 TOPS

7,916 TOPS

GPU memory

80GB

80GB

188GB

GPU memory bandwidth

3.35TB/s

2TB/s

7.8TB/s

Decoders

7 NVDEC

7 NVDEC

14 NVDEC

7 JPEG

7 JPEG

14 JPEG

Max thermal design power (TDP)

Up to 700W (configurable)

300-350W (configurable)

2x 350-400W (configurable)

Multi-Instance GPUs

Up to 7 MIGS @ 10GB each

Up to 7 MIGS @ 10GB each

Up to 14 MIGS @ 12GB each

Form factor

SXM

PCle

2x PCIe

Interconnect

NVLink: 900GB/s PCIe Gen5: 128GB/s

Dual-slot air-cooled NVLink: 600GB/s PCIe Gen5: 128GB/s

Dual-slot air-cooled NVLink: 600GB/s PCIe Gen5: 128GB/s

Server options

NVIDIA HGX H100 Partner and NVIDIA-Certified Systems with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUS

Partner and NVIDIA-Certified Systems with 1-8 GPUs

Partner and NVIDIA-Certified Systems with 2-4 pairs

NVIDIA AI Enterprise

Add-on

Included

Add-on