NVIDIA Announces the DGX-2 System – 16x Tesla V100 GPUs, 30 TB NVMe Memory for $400K

NVIDIA’s DGX-2 is likely the reason why NVIDIA seems to be slightly less enamored with the consumer graphics card market as of late. Let’s be honest: just look at that price-tag, and imagine the rivers of money NVIDIA is making on each of these systems sold. The data center and deep learning markets have been pouring money into NVIDIA’s coffers, and so, the company is focusing its efforts in this space. Case in point: the DGX-2, which sports performance of 1920 TFLOPs (Tensor processing); 480 TFLOPs of FP16; half again that value at 240 TFLOPs for FP32 workloads; and 120 TFLOPs on FP64.

NVIDIA’s DGX-2 builds upon the original DGX-1 in all ways thinkable. NVIDIA looks at these as readily-deployed processing powerhouses, which include everything any prospective user that requires gargantuan amounts of processing power can deploy in a single system. And the DGX-2 just runs laps around the DGX-1 (which originally sold for $150K) in all aspects: it features 16x 32GB Tesla V100 GPUs (the DGX-1 featured 8x 16 GB Tesla GPUs); 1.5 TB of system ram (the DGX-1 features a paltry 0.5 TB); 30 TB NVMe system storage (the DGX-1 sported 8 TB of such storage space), and even includes a pair of Xeon Platinum CPUs (admittedly, the lowest performance increase in the whole system).

The DGX-2 has been made possible by NVIDIA’s deployment of what it’s calling their NVSwitch, which enables 300 GB/s chip-to-chip communication at 12 times the speed of PCIe. Paired with the company’s NVLink2, this enables sixteen GPUs to be grouped together in a single system, for a total bandwidth going beyond 14 TB/s. NVIDIA is touting this as a 2 Petaflop-capable system, which isn’t that hard to imagine with all of the underlying hardware – it does include 81,920 CUDA cores, and 10,240 Tensor processing cores (which are what NVIDIA uses to achieve that 2 Petaflop figure, if you were wondering. The DGX-2 consumes power that is adequate to its innards – some 10 KW of power in operation, and the whole system weighs 350 pounds.

Some of NVIDIA’s remarks about this system follow:

NVSwitch: A Revolutionary Interconnect Fabric

NVSwitch offers 5x higher bandwidth than the best PCIe switch, allowing developers to build systems with more GPUs hyperconnected to each other. It will help developers break through previous system limitations and run much larger datasets. It also opens the door to larger, more complex workloads, including modeling parallel training of neural networks.

NVSwitch extends the innovations made available through NVIDIA NVLink, the first high-speed interconnect technology developed by NVIDIA. NVSwitch allows system designers to build even more advanced systems that can flexibly connect any topology of NVLink-based GPUs.

NVIDIA DGX-2: World’s First Two Petaflop System

NVIDIA’s new DGX-2 system reached the two petaflop milestone by drawing from a wide range of industry-leading technology advances developed by NVIDIA at all levels of the computing stack.

DGX-2 is the first system to debut NVSwitch, which enables all 16 GPUs in the system to share a unified memory space. Developers now have the deep learning training power to tackle the largest datasets and most complex deep learning models.

Combined with a fully optimized, updated suite of NVIDIA deep learning software, DGX-2 is purpose-built for data scientists pushing the outer limits of deep learning research and computing. DGX-2 can train FAIRSeq, a state-of-the-art neural machine translation model, in less than two days – a 10x improvement in performance from the DGX-1 with Volta, introduced in September. Source: AnandTech