After unbinding a GPU, KFD becomes locked and unusable, resulting in applications not being able to use ROCm for compute anymore and rocminfo outputting the following error message: ROCk module is loaded Unable to open /dev/kfd read-write: Invalid argument KFD remains locked even after rebinding the same GPU and a system reboot is required to unlock it. Fix this by not locking KFD during the GPU unbind process. Closes: https://github.com/RadeonOpenCompute/ROCm/issues/629 Signed-off-by: Lawrence Yiu <lawyiu.dev@xxxxxxxxx> --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0a9cf9dfc224..c9436039e619 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -949,8 +949,8 @@ void kgd2kfd_suspend(struct kfd_dev *kfd, bool run_pm) if (!kfd->init_complete) return; - /* for runtime suspend, skip locking kfd */ - if (!run_pm) { + /* for runtime suspend or GPU unbind, skip locking kfd */ + if (!run_pm && !drm_dev_is_unplugged(adev_to_drm(kfd->adev))) { mutex_lock(&kfd_processes_mutex); count = ++kfd_locked; mutex_unlock(&kfd_processes_mutex); -- 2.34.1