Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

NVIDIA tools are available in /usr/local/cuda-11.48/bin/. You can add them to PATH following:

Code Block
$ export PATH=$PATH:/usr/local/cuda-11.48/bin/

Libraries

CUDA version is currently 11.4 which need to be the same with drivers and thus can't be changed.  Tensorflow library compatibility is available at: https://www.tensorflow.org/install/source#gpu. We have tested that TensorFlow > 2.6.1 work.

...

Code Block
$ nvidia-smi
Mon Jan  8 10:24:59 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTXA6000...  On   | 00000000:00:05.0 Off |                    0 |
| N/A   N/A    P8    N/A /  N/A |   3712MiB / 48895MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$ python3 --version
Python 3.8.18

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

$ whereis cuda
cuda: /usr/local/cuda

$ cat /home/<USERNAME>/miniforge3/envs/ML/include/cudnn.h
.
.
.
/*   cudnn : Neural Networks Library

*/

#if !defined(CUDNN_H_)
#define CUDNN_H_

#include <cuda_runtime.h>
#include <stdint.h>

#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"

#include "cudnn_backend.h"

#if defined(__cplusplus)
extern "C" {
#endif

#if defined(__cplusplus)
}
#endif

#endif /* CUDNN_H_ */

$ conda list | grep tensorflow
tensorflow                2.13.1          cuda118py38h409af0c_1    conda-forge
tensorflow-base           2.13.1          cuda118py38h52ca5c6_1    conda-forge
tensorflow-estimator      2.13.1          cuda118py38ha2f8a09_1    conda-forge
tensorflow-gpu            2.13.1          cuda118py38h0240f8b_1    conda-forge

$ conda list | grep keras
keras                     2.13.1             pyhd8ed1ab_0    conda-forge

$ python
import tensorflow as tf
tf.test.is_built_with_cuda()
True
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
print(tf.__version__)
2.13.1

# if you installed pytorch using above optional conda installation, check:(OPTIONAL) Check pytorch
$ python
import torch

$ python
print(torch.__version__)  # Print PyTorch version
2.2.0

$ python
print(torch.cuda.is_available())  # Check if CUDA is available
True

$ python
print(torch.version.cuda)  # Print the CUDA version PyTorch is using
11.8

$ python
if torch.cuda.is_available():
    # Create a tensor and move it to GPU
    x = torch.tensor([1.0, 2.0]).cuda()
    print(x)  # Print the tensor to verify it's on the GPU
else:
    print("CUDA is not available. Check your PyTorch installation.")
tensor([1., 2.], device='cuda:0')



#Using Docker

If you want to use GPUs in docker, you need to take few extra steps after creating the VM.

  1. Install Docker 
    In ubuntu:

    Code Block
    sudo apt install -y docker.io
    sudo usermod -aG docker $USER

    In Centos:

    Code Block
    sudo yum-config-manager \
        --add-repo \
        https://download.docker.com/linux/centos/docker-ce.repo
    sudo yum install docker-ce docker-ce-cli containerd.io
    sudo systemctl --now enable docker
    sudo usermod -aG docker $USER


  2. Logout and login again
  3. Install nvidia-container toolkit
    Ubuntu:

    Code Block
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker

    Centos:

    Code Block
    	distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
       && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    sudo yum clean expire-cache && sudo yum install -y nvidia-docker2
    sudo systemctl restart docker


  4. Run GPU-compatible notebook. For example:

    Code Block
    # might need sudo
    sudo docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 -it --rm -v $(realpath ~/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter