The Magnum IO GPUDirect Storage program that Nvidia released in late 2019 to accelerate AI, analytics and high-performance computing workloads has ultimately arrived at 1. position soon after much more than a calendar year of beta tests.

The Magnum IO GPUDirect Storage driver allows end users to bypass the server CPU and transfer information right amongst high-performance GPU memory and storage, through a PCIe switch, to lessen I/O latency and maximize throughput with the most demanding information-intense apps.

Dion Harris, direct complex product or service marketing supervisor of accelerated computing at Nvidia, explained GPUDirect Storage lowers CPU utilization by a element of 3 and allows CPUs to concentrate on the function they were being constructed for — running processing-intense apps.

At this week’s ISC High Overall performance 2021 Electronic conference, Nvidia introduced that it included the Magnum IO GPUDirect Storage program to its HGX AI supercomputing system along with the new A100 eighty GB PCIe GPU and NDR 400G InfiniBand networking. Nvidia experienced to collaborate with organization community and storage companies to help the GPUDirect Storage. 

Storage distributors aid GPUDirect

Storage distributors with commonly obtainable products integrating GPUDirect Storage contain DataDirect Networks, Vast Information and WekaIO. Some others with products in the is effective contain Dell Technologies, Excelero, Hewlett Packard Enterprise, Hitachi Vantara, IBM, Micron, NetApp, Pavilion Information Techniques and ScaleFlux.

Steve McDowell, a senior technologies analyst at Moor Insights & Method, explained Nvidia’s GPUDirect Storage program will most generally see use with high-performance storage arrays that can supply the throughput demanded by the GPUs and aid a high-performance remote direct memory accessibility (RDMA) interconnect these types of as InfiniBand. Examples of GPUDirect Storage pairings contain IBM’s Elastic Storage System (ESS) 3200, NetApp’s EF600 all-flash NVMe array and Dell EMC’s PowerScale scale-out NAS process, he explained.

“GPUDirect Storage is created for production-level and weighty exploration deep-learning environments,” McDowell explained, noting the technologies targets installations with a number of GPUs operating on training algorithms the place I/O is a bottleneck. 

Nvidia DGX SuperPod with IBM ESS 3200

IBM introduced this week that it experienced up to date its storage reference architectures for two-, four- and eight-node Nvidia DGX Pod configurations and committed to aid a DGX SuperPod with its ESS 3200 by the finish of the third quarter. SuperPods start out at twenty Nvidia DGX A100 methods and can scale to a hundred and forty methods.

Douglas O’Flaherty, system director of portfolio GTM and alliances at IBM, explained making use of GPUDirect Storage on a two-node Nvidia DGX A100 can nearly double the information throughput, from forty GB for each second to seventy seven GB, with a one IBM ESS 3200 running Spectrum Scale.

“What it showcases for Nvidia is just how a lot information a GPU can start out to function by. And what it showcases for us is that, as your developers and apps embrace this, primarily for these huge information environments, you truly will need to make guaranteed that you have not just moved the bottleneck down into storage,” O’Flaherty explained. “With our most current variation of ESS 3200, we did a great volume of throughput with just a really couple methods.”

O’Flaherty explained prospects most interested in Nvidia GPUDirect Storage contain automobile manufacturers operating on self-driving vehicles, telecommunications companies with information-weighty natural language processing workloads, financial services firms looking to reduce latency, and genomics businesses with huge, intricate information sets.

Startup Vast Information has previously been given a handful of huge orders for GPUDirect Storage-enabled methods, according to CMO and co-founder Jeff Denworth. Examples contain a media studio undertaking volumetric information capture to develop 3D video clip, financial services firms running the Apache Spark analytics engine on the Rapids open up GPU information science framework, and HPC centers and manufacturers making use of PyTorch equipment learning libraries.

Denworth claimed that making use of GPUDirect Storage in Rapids and PyTorch jobs has enabled Vast Information to feed a conventional Spark or Postgres database about eighty occasions more quickly than a traditional NAS process could.

“We’ve been pleasantly shocked by the volume of jobs that we are getting engaged on for this new technologies,” Denworth explained. “And it truly is not only a make a difference of just creating particular AI apps run more quickly. There is certainly a whole gamut of GPU-oriented workloads the place prospects are now starting off to gravitate towards this GPUDirect Storage method as the way to feed these exceptionally hungry machines.”

Carol Sliwa is a TechTarget senior author covering storage arrays and drives, flash and memory systems, and organization architecture.