Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To use the GPUs:

  1. Provision new Centos or Ubuntu instance.
    Image RemovedImage Added
  2. Select layout ending with eumetsat-gpu and one of the plans listed above. Beside that, configure your instance as preferred and continue deployment process.
  3. Once VM is deployed, you can verify GPUs for example using nvidia-smi program from command line (see below for confirming library installations and drivers).

...

Code Block
languagebash
titleChecking the GPU driverscollapsetrue
# Login to your instance and run below command
$ nvidia-smi

# Check if the input you received shows the NVIDIA-SMI, Driver and CUDA versions. You can also see the GPU hardware (e.g., RTXA6000-6C) and the GPU memory
Mon Feb  5 13:01:43 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000-6C  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    512MiB /  5976MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

...

Code Block
languagebash
titleAdding NVIDIA tools to path
# NVIDIA tools are available in /usr/local/cuda-11.8/bin/. You can add them to PATH following:
$ export PATH=$PATH:/usr/local/cuda-11.8/bin/

Installing Libraries

You can install a variety of libraries using different methods. Below, we have a basic tutorial showing you how you can install libraries such as TensorFlow, Keras and PyTorch with conda package manger. CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed.  Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu.

Using conda

Code Block
languagebash
titleConda installation
# install miniforge (or any anacondaconda manager)
$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh

# make it executable
$ chmod +x Miniforge3-Linux-x86_64.sh

# run and install the executable
$ ./Miniforge3-Linux-x86_64.sh

...

Code Block
languagebash
titleLibrary installations
# create a conda environment called ML with Python 3.8
$ conda create -n ML python=3.8

# activate the environment
$ conda activate ML

# install packages, note that installing tensorflow-gpu and keras also installs many number of extra libraries such as CUDA toolkit, cuDNN (CUDA Deep Neural Network library), Numpy, Scipy, Pillow
(ML) $ conda install tensorflow-gpu keras

# (OPTIONAL) cudatoolkit is installed automatically while installing keras and tensorflow-gpu, but if you need a specific (or latest) version run below command.
(ML) $ conda install -c anaconda cudatoolkit

# (OPTIONAL) installing pytorch GPU, pytorch might need cuda 11.8
(ML) $ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Installation Confirmations

Here we run a few initial commands for different libraries & drivers for confirming the library integrations with the GPU.

Code Block
languagebash
titleCheck python version
(ML) $ python3 --version
Python 3.8.18

...

Code Block
languagebash
titleInstall docker on CentOS
$ sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
$ sudo yum install docker-ce docker-ce-cli containerd.io
$ sudo systemctl --now enable docker
$ sudo usermod -aG docker $USER


To provide support for docker to use the GPU, you need to install the NVIDIA Container Toolkit.  You can follow instructions on NVIDIA's website or basically do:

Code Block
languagebash
titleInstall necessary packages for GPU support in Docker and restart docker on Ubuntu
$ distribution=
Code Block
languagebash
titleInstall necassery packages and restart docker on Ubuntu
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker

...

Code Block
languagebash
titleInstall necassery necessary packages and restart docker on CentOS
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
$ sudo yum clean expire-cache && sudo yum install -y nvidia-docker2
$ sudo systemctl restart docker

Test the install with:

Code Block
titlenvidia-smi in docker test
$ docker run  --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Wed Feb 28 13:20:24 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000-6C  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    512MiB /  5976MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And run something useful..

Code Block
languagebash
titleRun tensorflow
languagebash
titleRun tensorflow GPU JupyterNotebooks
$ sudo docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter

...