Unable to Reset GPU : In use by another client
Issue: When trying to reset one or several GPUs with a command like
sudo nvidia-smi -r [-i <idx>]
orsudo nvidia-smi --gpu-reset [-i <idx>]
the reset fails with error messageThe following GPUs could not be reset: ... In use by another client
despite seemingly no workload running on the GPUs.Workaround: This is a common case of GPU use by services and kernel modules running on a VM or bare-metal instance. A simple solution is to reboot the cloud instance which includes resetting all its GPUs. We generally recommend using this way as all software and services would come up in a normal fashion after reboot. However, when it is not possible or highly undesirable to reboot the cloud instance, there is a more complex and potentially more fragile way to reboot GPUs on the running instance. This solution is described in Frequently asked questions / How to reset GPUs on a running instance.