You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

EUMETSAT infrastructure contains RX A6000 NVIDIA GPU cards. To employ the GPU, one need to provision one of the following flavors:

Flavor namevCPURAMvGPU TypevGPU RAMSSD storage (GB)
vm.a6000.1214 GBRTXA6000-6C6 GB40
vm.a6000.2428 GBRTXA6000-12C12 GB80
vm.a6000.4856 GBRTXA6000-24C24 GB160
vm.a6000.816112 GBRTXA6000-48C48 GB320

To use the GPUs:

  1. Provision new Centos or Ubuntu instance.
  2. Select layouyt ending with eumetsat-gpu and one of the plans listed above. Beside that, configure your instance as preferred and continue deployment process.
  3. Once VM is deployed, you can verify GPUs for example using nvidia-smi program from command line (see below for confirming library installations and drivers).

Usage

Useful commands

You can see GPU information using nvidia-smi 

$ nvidia-smi
Mon Jan  8 10:24:59 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000...  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |   3712MiB / 48895MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

NVIDIA tools are available in /usr/local/cuda-11.4/bin/. You can add them to PATH following:

export PATH=$PATH:/usr/local/cuda-11.4/bin/

Libraries

CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed.  Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu. We have tested that TensorFlow > 2.6.1 work.

Using Conda

Update and conda installation

# change shell to bash for installations
bash

# update default packages
sudo apt-get update
sudo apt-get update

# it's possible to get some update key and dirmngr errors while updating, below commands supply a workaround. After running the workaround, run update & upgrade again.
sudo apt install dirmngr
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys <YOUR-KEY-LIKE-AA16FCBCA621E701>

# install miniforge (or any anaconda manager)
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
chmod +x Miniforge3-Linux-x86_64.sh
./Miniforge3-Linux-x86_64.sh

#When it asks, conda init? answer yes
#Do you wish the installer to initialize Miniforge3
#by running conda init? [yes|no]
#[no] >>> 
yes

exit
bash

Library installations

sudo apt install -y docker.io
sudo usermod -aG docker $USER

Confirmation of installations

sudo apt install -y docker.io
sudo usermod -aG docker $USER


#Using Docker

If you want to use GPUs in docker, you need to take few extra steps after creating the VM.

  1. Install Docker 
    In ubuntu:

    sudo apt install -y docker.io
    sudo usermod -aG docker $USER

    In Centos:

    sudo yum-config-manager \
        --add-repo \
        https://download.docker.com/linux/centos/docker-ce.repo
    sudo yum install docker-ce docker-ce-cli containerd.io
    sudo systemctl --now enable docker
    sudo usermod -aG docker $USER
  2. Logout and login again
  3. Install nvidia-container toolkit
    Ubuntu:

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker

    Centos:

    	distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
       && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    sudo yum clean expire-cache && sudo yum install -y nvidia-docker2
    sudo systemctl restart docker
  4. Run GPU-compatible notebook. For example:

    docker run --gpus all -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter
  • No labels