Introduction
EUMETSAT infrastructure contains RX A6000 NVIDIA GPU cards. To employ the GPU, one need to provision one of the following flavors:
Flavor name | vCPU | RAM | vGPU Type | vGPU RAM | SSD storage (GB) |
---|---|---|---|---|---|
vm.a6000.1 | 2 | 14 GB | RTXA6000-6C | 6 GB | 40 |
vm.a6000.2 | 4 | 28 GB | RTXA6000-12C | 12 GB | 80 |
vm.a6000.4 | 8 | 56 GB | RTXA6000-24C | 24 GB | 160 |
vm.a6000.8 | 16 | 112 GB | RTXA6000-48C | 48 GB | 320 |
Provision
To use the GPUs:
- Provision new Centos or Ubuntu instance.
- Select layout ending with
eumetsat
-gpu and one of the plans listed above. Beside that, configure your instance as preferred and continue deployment process. - Once VM is deployed, you can verify GPUs for example using
nvidia-smi
program from command line (see below for confirming library installations and drivers).
Usage
Essential commands
You can see GPU information using nvidia-smi
Adding NVIDIA tools to path
# NVIDIA tools are available in /usr/local/cuda-11.8/bin/. You can add them to PATH following: $ export PATH=$PATH:/usr/local/cuda-11.8/bin/
Installing Libraries
You can install a variety of libraries using different methods. Below, we have a basic tutorial showing you how you can install libraries such as TensorFlow, Keras and PyTorch with conda package manger. CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed. Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu.
Using conda
Conda installation
# install miniforge (or any anaconda manager) $ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh # make it executable $ chmod +x Miniforge3-Linux-x86_64.sh # run and install the executable $ ./Miniforge3-Linux-x86_64.sh
Library installations
# create a conda environment called ML with Python 3.8 $ conda create -n ML python=3.8 # activate the environment $ conda activate ML # install packages, note that installing tensorflow-gpu and keras also installs many number of extra libraries such as CUDA toolkit, cuDNN (CUDA Deep Neural Network library), Numpy, Scipy, Pillow (ML) $ conda install tensorflow-gpu keras # (OPTIONAL) cudatoolkit is installed automatically while installing keras and tensorflow-gpu, but if you need a specific (or latest) version run below command. (ML) $ conda install -c anaconda cudatoolkit # (OPTIONAL) installing pytorch GPU, pytorch might need cuda 11.8 (ML) $ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Installation Confirmations
Check python version
(ML) $ python3 --version Python 3.8.18
Check NVIDIA Cuda compiler driver
(ML) $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0
Check TensorFlow installations in Conda environment
(ML) $ conda list | grep tensorflow tensorflow 2.13.1 cuda118py38h409af0c_1 conda-forge tensorflow-base 2.13.1 cuda118py38h52ca5c6_1 conda-forge tensorflow-estimator 2.13.1 cuda118py38ha2f8a09_1 conda-forge tensorflow-gpu 2.13.1 cuda118py38h0240f8b_1 conda-forge
Check keras installations in Conda environment
(ML) $ conda list | grep keras keras 2.13.1 pyhd8ed1ab_0 conda-forge
Enter python interpreter
(ML) $ python
Check TensorFlow installation
import tensorflow as tf tf.test.is_built_with_cuda() True tf.config.list_physical_devices('GPU') [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] print(tf.__version__) 2.13.1
Check Pytorch and CUDA installation
import torch print(torch.__version__) 2.2.0 print(torch.cuda.is_available()) True print(torch.version.cuda) 11.8 if torch.cuda.is_available(): # Create a tensor and move it to GPU x = torch.tensor([1.0, 2.0]).cuda() print(x) # Print the tensor to verify it's on the GPU else: print("CUDA is not available. Check your PyTorch installation.") tensor([1., 2.], device='cuda:0')
Using Docker
If you want to use GPUs in docker, you need to take few extra steps after creating the VM.
Install docker on Ubuntu
$ sudo apt install -y docker.io $ sudo usermod -aG docker $USER
Install docker on CentOS
$ sudo yum-config-manager \ --add-repo \ https://download.docker.com/linux/centos/docker-ce.repo $ sudo yum install docker-ce docker-ce-cli containerd.io $ sudo systemctl --now enable docker $ sudo usermod -aG docker $USER
Install necassery packages and restart docker on Ubuntu
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit $ sudo systemctl restart docker
Install necassery packages and restart docker on CentOS
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo $ sudo yum clean expire-cache && sudo yum install -y nvidia-docker2 $ sudo systemctl restart docker
Run tensorflow JupyterNotebooks
$ sudo docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter