Announcing NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System

NVIDIA DGX GH200: The First 100 Terabyte GPU Memory System

NVIDIA introduces their newest GPU system, the DGX GH200, which is the first supercomputer to break the 100-terabyte barrier for memory accessible to GPUs over NVLink.
The DGX GH200 system is designed to empower scientists in need of an advanced platform that can solve extraordinary challenges.
The system is built with NVIDIA Grace Hopper Superchip and NVLink Switch System, which combines to form a giant data center-sized GPU.
The architecture allows for up to 256 GPUs in a DGX GH200 system, which provides nearly 500x more memory to the GPU shared memory programming model over NVLink compared to a single NVIDIA DGX A100 320 GB system.
NVIDIA Base Command provides an OS optimized for AI workloads, a cluster manager, and libraries that accelerate compute, storage, and network infrastructure.

NVLink technology was introduced in 2016, along with the Unified Memory Programming model with CUDA-6, designed to increase the memory available to GPU-accelerated workloads.
The core of every DGX system is a GPU complex on a baseboard interconnected with NVLink, allowing each GPU to access the other’s memory at NVLink speed.

NVIDIA Grace Hopper Superchip combines the Grace and Hopper architectures using NVIDIA NVLink-C2C to deliver a CPU + GPU coherent memory model.
NVIDIA Grace CPU and Hopper GPU are interconnected with NVLink-C2C, providing 7x more bandwidth than PCIe Gen5 at one-fifth the power.
NVLink Switch System forms a two-level, non-blocking, fat-tree NVLink fabric to fully connect 256 Grace Hopper Superchips in a DGX GH200 system.
Every GPU in DGX GH200 can access the memory of other GPUs and extended GPU memory of all NVIDIA Grace CPUs at 900 GBps.

DGX GH200 comes with NVIDIA Base Command, which includes an OS optimized for AI workloads, cluster manager, and libraries that accelerate compute, storage, and network infrastructure.
BlueField-3 DPUs can transform any enterprise computing environment into a secure and accelerated virtual private cloud, enabling organizations to run application workloads in secure, multi-tenant environments.

The power of BlueField-3 DPUs makes the DGX H100 the most performance-efficient training solution for enterprise workloads.
The DGX GH200 is a better solution for more advanced AI and HPC models that require massive memory for GPU shared memory programming.
NVIDIA is working to make DGX GH200 available at the end of this year.