How to address the H100 after disabling NVIDIA MIG - CUDA busy or unavailable issue?
After disabling NVIDIA MIG, you may encounter a CUDA busy or unavailable issue with H100. This issue can be addressed using a few different methods, depending on the root cause.
First, ensure that the NVIDIA driver is properly installed and up to date. The H100 requires a driver version of at least 450.36.06. If the driver is not up to date, download and install the latest version from the NVIDIA website.
If the driver is up to date and the issue persists, check that the H100 is properly connected to the system and powered on. Make sure that all cables are securely connected and that the H100 is receiving power.
Another possible solution is to reset the GPU. This can be done by using the nvidia-smi command line utility. Simply run the following command:
nvidia-smi --gpu-reset -i <GPU ID>
Replace <GPU ID> with the ID of the H100 GPU. This will reset the GPU and may resolve the CUDA busy or unavailable issue.
If the issue still persists, try disabling and re-enabling the H100 in the NVIDIA Control Panel. To do this, open the NVIDIA Control Panel, navigate to the Manage 3D Settings tab, and click on the "Disable" button next to the H100 GPU. Wait a few moments and then click on the "Enable" button to re-enable the GPU.
If none of these solutions work, you may need to contact NVIDIA support for further assistance. They can help diagnose and resolve any issues with the H100 or the NVIDIA driver.