nvidia-smi

nvidia-smi provides monitoring and management capabilities for the GPUs from the command line and will give you instantaneous information about your GPUs.

$ nvidia-smi
Wed Mar  8 14:39:45 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:03:00.0 Off |                    0 |
| N/A   62C    P0   351W / 400W |  39963MiB / 40960MiB |     93%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    181525      C   python                          39960MiB |
+-----------------------------------------------------------------------------+

This command has a number of advanced command options. If you want to log the usage of the GPUs by your processes in a batch job you could use the following strategy:


nvidia-smi pmon -o DT -d 5 --filename gpu_usage.log &
monitor_pid=$!
your_gpu_workload goes here
kill $monitor_pid

In this example, nvidia-smi will then log into gpu_usage.log the processes using the gpu and their resource usage, every 5 seconds, and adding the date and time on each line for better tracking.

See man nvidia-smi for more information

nvtop

Nvtop stands for Neat Videocard TOP, a (h)top like task monitor for GPUs. It can handle multiple GPUs and print information about them in a htop familiar way. It is useful if you want to interactively monitor the GPU usage and see its evolution live.


See man nvtop for all the options.

  • No labels