Hardware

Baskerville is built using Lenovo SD650 V2 liquid cooled compute systems. There are 46 compute systems each equipped with 4x NVIDIA A100 (40GB) GPUs installed onto a liquid cooled HGX-100 board. Each compute system is fitted with 2x 36 core Intel 8360Y CPUs (3rd Generation Intel Xeon Scalable / “Ice Lake”) and 512GB RAM (16x 32GB DDR4) with local ~1TB NVMe device which provides some in-node local scratch storage. The HGX-100 board is connected using PCIe gen 4 to the system CPU planar.

Interconnect is built using NVIDIA Networking HDR switches and adapters with compute nodes each having 1x HDR (200Gbps) connection built on a fat-tree topology.

In addition to the compute systems, Lenovo DSS-G storage systems running IBM Spectrum Scale provide storage. About 4.6PB of spinning disk provides bulk data storage with a separate DSS-G system providing SSD based storage for metadata and to act as a global-shared scratch file-system. The storage systems are attached with HDR-100 and provide storage access using RDMA.