Re: [PATCH] drm/amdkfd: Fix an eviction fence leak

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024-09-27 06:36, Lang Yu wrote:
dma_fence_get/put() should be called balanced in
init_kfd_vm() and amdgpu_amdkfd_gpuvm_destroy_cb().

I don't think that's correct. The reference taken in init_kfd_vm is returned to the caller of amdgpu_amdkfd_gpuvm_acquire_process_vm, which gets stored in the kfd_process structure. I think it's that caller's responsibility to drop their reference. I think the real problem is, that we're creating a new reference for each VM, but the kfd_process structure is only one per process. So the RCU_INIT_POINTER(p->ef, ef); in kfd_process_device_init_vm leaks the previous references.

Since we only need to get the eviction fence reference when creating the first VM, I suggest this fix in kfd_process_device_init_vm:

         ret = amdgpu_amdkfd_gpuvm_acquire_process_vm(dev->adev, avm,
                                                      &p->kgd_process_info,
-                                                     &ef);
+                                                     p->ef ? NULL : &ef);

And in init_kfd_vm:

         if (ef)
-        *ef = dma_fence_get(&vm->process_info->eviction_fence->base);
+                *ef = dma_fence_get(&vm->process_info->eviction_fence->base);

Regards,
  Felix



Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")

Signed-off-by: Lang Yu <lang.yu@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index ce5ca304dba9..c3a4f8d297f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1586,6 +1586,7 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
/* Update process info */
  	mutex_lock(&process_info->lock);
+	dma_fence_put(&process_info->eviction_fence->base);
  	process_info->n_vms--;
  	list_del(&vm->vm_list_node);
  	mutex_unlock(&process_info->lock);
@@ -1598,7 +1599,6 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
  		WARN_ON(!list_empty(&process_info->userptr_valid_list));
  		WARN_ON(!list_empty(&process_info->userptr_inval_list));
- dma_fence_put(&process_info->eviction_fence->base);
  		cancel_delayed_work_sync(&process_info->restore_userptr_work);
  		put_pid(process_info->pid);
  		mutex_destroy(&process_info->lock);



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux