Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

EUMETSAT infrastructure contains RX A6000 NVIDIA GPU cards. To employ the GPU, one need to provision one of the following flavors:

Flavor namevCPURAMvGPU TypevGPU RAMSSD storage (GB)
vm.a6000.1214 GBRTXA6000-6C6 GB40
vm.a6000.2428 GBRTXA6000-12C12 GB80
vm.a6000.4856 GBRTXA6000-24C24 GB160
vm.a6000.816112 GBRTXA6000-48C48 GB320

Provision

To use the GPUs:

  1. Provision new Centos or Ubuntu instance.
    Image RemovedImage Added
  2. Select layout ending with eumetsat-gpu and one of the plans listed above. Beside that, configure your instance as preferred and continue deployment process.
  3. Once VM is deployed, you can verify GPUs for example using nvidia-smi program from command line (see below for confirming library installations and drivers).

Usage

...

Essential commands

You can see GPU information using nvidia-smi 

Code Block
languagebash
titleChecking the GPU drivers
# Login to your instance and run below command
$ nvidia-smi
Mon Feb  5 13:01:43 2024
+----
# Check if the input you received shows the NVIDIA-SMI, Driver and CUDA versions. You can also see the GPU hardware (e.g., RTXA6000-6C) and the GPU memory
Mon Feb  5 13:01:43 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000-6C  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    512MiB /  5976MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

...


Code Block
languagebash
titleAdding NVIDIA tools to path
# NVIDIA tools are available in /usr/local/cuda-11.8/bin/. You can add them to PATH following:

...

Code Block

$ export PATH=$PATH:/usr/local/cuda-11.8/bin/

Installing Libraries

CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed.  Tensorflow library You can install a variety of libraries using different methods. Below, we have a basic tutorial showing you how you can install libraries such as TensorFlow, Keras and PyTorch with conda package manger. CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed.  Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu. We have tested that TensorFlow > 2.6.1 work.

Using

...

conda

...

Code Block
languagebash
titleConda installation
# change shell to bash for installations
$ bash

# install miniforge (or any anacondaconda manager)
$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh

# make it executable
$ chmod +x Miniforge3-Linux-x86_64.sh

# run and install the executable
$ ./Miniforge3-Linux-x86_64.sh

#When it asks, conda init? answer yes
#Do you wish the installer to initialize Miniforge3
#by running conda init? [yes|no]
#[no] >>> 
$ yes

$ exit
$ bash

Library installations


Code Block
languagebash
titleLibrary installations
# create a conda environment called ML with Python 3.8
$ conda create -n ML python=3.8

# activate the environment
$ conda activate ML

# install packages, note that installing tensorflow-gpu and keras also installs many number of extra libraries such as CUDA toolkit, cuDNN (CUDA Deep 
Code Block
# create conda environment
$ conda create -n ML python=3.8

# activate the environment
$ conda activate ML

# install packages, note that installing tensorflow-gpu and keras also installs: CUDA toolkit, cuDNN (CUDA Deep Neural Network library), Numpy, Scipy, Pillow
(ML) $ conda install tensorflow-gpu keras

# (OPTIONAL) cudatoolkit is installed automatically while installing keras and tensorflow-gpu, but if you need a specific (or latest) version run below command.
(ML) $ conda install -c anaconda cudatoolkit

# (OPTIONAL) Installinginstalling pytorch GPU, pytorch might need cuda 11.8
(ML) $ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Confirmation of installations

Installation Confirmations

Here we run a few initial commands for different libraries & drivers for confirming the library integrations with the GPU.

Code Block
languagebash
titleCheck python version
(ML) $ python3 --version
Python 3.8.18
Code Block
languagebash
titleCheck NVIDIA Cuda compiler driver
(ML) $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Code Block
languagebash
titleCheck TensorFlow installations in Conda environment
(ML) $ conda list | grep tensorflow
tensorflow                2.13.1          cuda118py38h409af0c_1    conda-forge
tensorflow-base           2.13.1          cuda118py38h52ca5c6_1    conda-forge
tensorflow-estimator      2.13.1          cuda118py38ha2f8a09_1    conda-forge
tensorflow-gpu            2.13.1          cuda118py38h0240f8b_1    conda-forge
Code Block
languagebash
titleCheck keras installations in Conda environment
(ML) $ conda list | grep keras
keras    
Code Block
$ nvidia-smi
Mon Feb  5 13:14:45 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|2.13.1             pyhd8ed1ab_0     Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000-6C  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    512MiB /  5976MiB |      0%      Default |
|                              conda-forge
Code Block
languagebash
titleEnter python interpreter
(ML) $ python
Code Block
languagepy
titleCheck TensorFlow installation
import tensorflow as tf

	tf.test.is_built_with_cuda()
	True


tf.config.list_physical_devices('GPU')

    [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


print(tf.__version__)

    2.13.1
Code Block
languagepy
titleCheck Pytorch and CUDA installation
import torch
print(torch.__version__)

    2.2.0


print(torch.cuda.is_available())

    True


print(torch.version.cuda)

    11.8


if torch.cuda.is_available():
    # Create a tensor and move it to GPU
    x = torch.tensor([1.0, 2.0]).cuda()
    print(x)  # Print the tensor to verify it's on the GPU
else:
    print("CUDA is not available. Check your PyTorch installation.")

    tensor([1., 2.], device='cuda:0')

Using Docker

If you want to use GPUs in docker, you need to take few extra steps after creating the VM.


Code Block
languagebash
titleInstall docker on Ubuntu
$ sudo apt install -y docker.io
$ sudo usermod -aG docker $USER


Code Block
languagebash
titleInstall docker on CentOS
$ sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
$ sudo yum install docker-ce docker-ce-cli containerd.io
$ sudo systemctl --now enable docker
$ sudo usermod -aG docker $USER


To provide support for docker to use the GPU, you need to install the NVIDIA Container Toolkit.  You can follow instructions on NVIDIA's website or basically do:

Code Block
languagebash
titleInstall necessary packages for GPU support in Docker and restart docker on Ubuntu
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker


Code Block
languagebash
titleInstall necessary packages and restart docker on CentOS
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
$ sudo yum clean expire-cache && sudo yum install -y nvidia-docker2
$ sudo systemctl restart docker

Test the install with:

Code Block
titlenvidia-smi in docker test
$ docker run  --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Wed Feb 28 13:20:24 2024       
 |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+-------+
| Processes:       ---------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|               |
|  GPU   GI   CI        PID|   Type   Process name               |   GPU Memory |
|        ID  MIG ID                                                   Usage      M. |
|===============================+======================+========================|
|   0  NVIDIA  No running processes foundRTXA6000-6C  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |    512MiB /  5976MiB |      0%       |
+-----------------------------------Default |
|                               |                      |                  N/A |
+-------------------------------+--------------------+
Code Block
$ python3 --version
Python 3.8.18
Code Block
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Code Block
$ cat /home/<USERNAME>/miniforge3/envs/ML/include/cudnn.h
.
.
.
/*   cudnn : Neural Networks Library

*/

#if !defined(CUDNN_H_)
#define CUDNN_H_

#include <cuda_runtime.h>
#include <stdint.h>

#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"

#include "cudnn_backend.h"

#if defined(__cplusplus)
extern "C" {
#endif

#if defined(__cplusplus)
}
#endif

#endif /* CUDNN_H_ */
Code Block
$ conda list | grep tensorflow
tensorflow--+----------------------+
                                                                                 2.13.1          cuda118py38h409af0c_1    conda-forge
tensorflow-base           2.13.1          cuda118py38h52ca5c6_1    conda-forge
tensorflow-estimator      2.13.1 
+-----------------------------------------------------------------------------+
| Processes:         cuda118py38ha2f8a09_1    conda-forge
tensorflow-gpu               2.13.1          cuda118py38h0240f8b_1    conda-forge
Code Block
$ conda list | grep keras
keras                   |
|  2.13.1GPU   GI   CI        pyhd8ed1ab_0PID   Type conda-forge
Code Block
$ python
import tensorflowProcess as tf

tf.test.is_built_with_cuda()
True

tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

print(tf.__version__)
2.13.1

# (OPTIONAL) Check pytorch
import torch

print(torch.__version__)  # Print PyTorch version
2.2.0

print(torch.cuda.is_available())  # Check if CUDA is available
True

print(torch.version.cuda)  # Print the CUDA version PyTorch is using
11.8

if torch.cuda.is_available():
    # Create a tensor and move it to GPU
    x = torch.tensor([1.0, 2.0]).cuda()
    print(x)  # Print the tensor to verify it's on the GPU
else:
    print("CUDA is not available. Check your PyTorch installation.")

tensor([1., 2.], device='cuda:0')

#Using Docker

If you want to use GPUs in docker, you need to take few extra steps after creating the VM.

Install Docker 
In ubuntu:

Code Block
sudo apt install -y docker.io
sudo usermod -aG docker $USER

In Centos:

Code Block
sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce docker-ce-cli containerd.io
sudo systemctl --now enable docker
sudo usermod -aG docker $USER

...

Install nvidia-container toolkit
Ubuntu:

Code Block
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Centos:

Code Block
	distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
   && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo yum clean expire-cache && sudo yum install -y nvidia-docker2
sudo systemctl restart docker
name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And run something useful..

Code Block
languagebash
titleRun tensorflow JupyterNotebooks
$ 

Run GPU-compatible notebook. For example:

...

sudo docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter