Problem
A GPU-enabled instance does not seem to be able to use the device. The driver does not seem to be running. and when running "nvidia-smi" you get an error such as:
$> nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
This usually happens after an update of the Operating System kernel, and requires a rebuild of the NVIDIA driver to be compatible with the new kernel.
Solution
You need to reinstall the Nvidia drivers.
For this, perform the steps documented here: Update GPU Nvidia Driver
Once your instance is running, you can check wether your instance can see the GPU with:
$> nvidia-smi Tue Nov 17 15:20:38 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.87 Driver Version: 440.87 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GRID V100-4C On | 00000000:00:05.0 Off | 0 | | N/A N/A P0 N/A / N/A | 304MiB / 4096MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+