May 17, 2022


Born to play

Deploy your containerized AI applications with nvidia-docker

Far more and a lot more items and services are using benefit of the modeling and prediction capabilities of AI. This short article provides the nvidia-docker device for integrating AI (Synthetic Intelligence) application bricks into a microservice architecture. The primary edge explored listed here is the use of the host system’s GPU (Graphical Processing Unit) methods to accelerate several containerized AI apps.

To fully grasp the usefulness of nvidia-docker, we will commence by describing what type of AI can gain from GPU acceleration. Secondly we will existing how to put into action the nvidia-docker instrument. Lastly, we will explain what equipment are available to use GPU acceleration in your apps and how to use them.

Why making use of GPUs in AI apps?

In the area of artificial intelligence, we have two primary subfields that are utilised: device discovering and deep understanding. The latter is aspect of a larger loved ones of device discovering approaches primarily based on artificial neural networks.

In the context of deep discovering, in which functions are basically matrix multiplications, GPUs are a lot more efficient than CPUs (Central Processing Models). This is why the use of GPUs has developed in recent years. In truth, GPUs are deemed as the coronary heart of deep understanding due to the fact of their massively parallel architecture.

However, GPUs are unable to execute just any method. In fact, they use a precise language (CUDA for NVIDIA) to take edge of their architecture. So, how to use and talk with GPUs from your applications?

The NVIDIA CUDA technological innovation

NVIDIA CUDA (Compute Unified Machine Architecture) is a parallel computing architecture mixed with an API for programming GPUs. CUDA interprets software code into an instruction set that GPUs can execute.

A CUDA SDK and libraries such as cuBLAS (Standard Linear Algebra Subroutines) and cuDNN (Deep Neural Network) have been produced to converse quickly and successfully with a GPU. CUDA is accessible in C, C++ and Fortran. There are wrappers for other languages which includes Java, Python and R. For example, deep understanding libraries like TensorFlow and Keras are primarily based on these systems.

Why using nvidia-docker?

Nvidia-docker addresses the needs of builders who want to increase AI performance to their purposes, containerize them and deploy them on servers run by NVIDIA GPUs.

The aim is to established up an architecture that lets the advancement and deployment of deep learning styles in companies accessible by means of an API. As a result, the utilization amount of GPU assets is optimized by making them readily available to many application situations.

In addition, we benefit from the strengths of containerized environments:

  • Isolation of scenarios of every single AI product.
  • Colocation of a number of types with their precise dependencies.
  • Colocation of the similar design under numerous variations.
  • Reliable deployment of models.
  • Product overall performance checking.

Natively, using a GPU in a container needs installing CUDA in the container and providing privileges to obtain the system. With this in intellect, the nvidia-docker instrument has been produced, enabling NVIDIA GPU devices to be uncovered in containers in an isolated and protected method.

At the time of writing this post, the most up-to-date version of nvidia-docker is v2. This variation differs significantly from v1 in the pursuing methods:

  • Version 1: Nvidia-docker is implemented as an overlay to Docker. That is, to generate the container you experienced to use nvidia-docker (Ex: nvidia-docker run ...) which performs the steps (among the other folks the creation of volumes) letting to see the GPU units in the container.
  • Model 2: The deployment is simplified with the substitution of Docker volumes by the use of Docker runtimes. Without a doubt, to start a container, it is now needed to use the NVIDIA runtime by using Docker (Ex: docker run --runtime nvidia ...)

Take note that owing to their diverse architecture, the two variations are not suitable. An software penned in v1 will have to be rewritten for v2.

Placing up nvidia-docker

The expected elements to use nvidia-docker are:

  • A container runtime.
  • An offered GPU.
  • The NVIDIA Container Toolkit (major portion of nvidia-docker).



A container runtime is essential to run the NVIDIA Container Toolkit. Docker is the encouraged runtime, but Podman and containerd are also supported.

The official documentation provides the installation technique of Docker.


Motorists are expected to use a GPU product. In the circumstance of NVIDIA GPUs, the motorists corresponding to a offered OS can be acquired from the NVIDIA driver obtain site, by filling in the details on the GPU design.

The set up of the motorists is accomplished by way of the executable. For Linux, use the adhering to commands by replacing the name of the downloaded file:

chmod +x NVIDIA-Linux-x86_64-470.94.operate

Reboot the host device at the end of the installation to take into account the installed drivers.

Setting up nvidia-docker

Nvidia-docker is offered on the GitHub task page. To set up it, abide by the installation manual depending on your server and architecture specifics.

We now have an infrastructure that permits us to have isolated environments giving access to GPU resources. To use GPU acceleration in programs, various tools have been designed by NVIDIA (non-exhaustive list):

  • CUDA Toolkit: a set of tools for producing software program/plans that can execute computations working with both CPU, RAM, and GPU. It can be employed on x86, Arm and Energy platforms.
  • NVIDIA cuDNN]( a library of primitives to accelerate deep discovering networks and optimize GPU general performance for main frameworks this sort of as Tensorflow and Keras.
  • NVIDIA cuBLAS: a library of GPU accelerated linear algebra subroutines.

By employing these applications in software code, AI and linear algebra tasks are accelerated. With the GPUs now visible, the application is in a position to deliver the details and operations to be processed on the GPU.

The CUDA Toolkit is the lowest level possibility. It features the most manage (memory and recommendations) to create custom made applications. Libraries offer an abstraction of CUDA features. They allow for you to aim on the application enhancement fairly than the CUDA implementation.

As soon as all these components are executed, the architecture using the nvidia-docker services is all set to use.

Below is a diagram to summarize anything we have found:


We have established up an architecture allowing the use of GPU resources from our programs in isolated environments. To summarize, the architecture is composed of the following bricks:

  • Working method: Linux, Home windows …
  • Docker: isolation of the environment making use of Linux containers
  • NVIDIA driver: set up of the driver for the hardware in problem
  • NVIDIA container runtime: orchestration of the past a few
  • Programs on Docker container:
    • CUDA
    • cuDNN
    • cuBLAS
    • Tensorflow/Keras

NVIDIA carries on to develop tools and libraries all-around AI technologies, with the objective of creating itself as a leader. Other technologies may well enhance nvidia-docker or may perhaps be far more appropriate than nvidia-docker depending on the use case.