Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process restore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-03-08 11:22, Mukul Joshi wrote:
In certain situations, some apps can import a BO multiple times
(through IPC for example). To restore such processes successfully,
we need to tell drm to ignore duplicate BOs.
While at it, also add additional logging to prevent silent failures
when process restore fails.

Signed-off-by: Mukul Joshi <mukul.joshi@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 14 ++++++++++----
  1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index bf8e6653341f..65d808d8b5da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2869,14 +2869,16 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
mutex_lock(&process_info->lock); - drm_exec_init(&exec, 0);
+	drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES);
  	drm_exec_until_all_locked(&exec) {
  		list_for_each_entry(peer_vm, &process_info->vm_list_head,
  				    vm_list_node) {
  			ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
  			drm_exec_retry_on_contention(&exec);
-			if (unlikely(ret))
+			if (unlikely(ret)) {
+				pr_err("Locking VM PD failed, ret: %d\n", ret);

pr_err makes sense here as it indicates a persistent problem that would cause soft hangs, like in this case.


  				goto ttm_reserve_fail;
+			}
  		}
/* Reserve all BOs and page tables/directory. Add all BOs from
@@ -2889,8 +2891,10 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
  			gobj = &mem->bo->tbo.base;
  			ret = drm_exec_prepare_obj(&exec, gobj, 1);
  			drm_exec_retry_on_contention(&exec);
-			if (unlikely(ret))
+			if (unlikely(ret)) {
+				pr_err("drm_exec_prepare_obj failed, ret: %d\n", ret);

Same here, pr_err is fine.


  				goto ttm_reserve_fail;
+			}
  		}
  	}
@@ -2950,8 +2954,10 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu *
  	 * validations above would invalidate DMABuf imports again.
  	 */
  	ret = process_validate_vms(process_info, &exec.ticket);
-	if (ret)
+	if (ret) {
+		pr_err("Validating VMs failed, ret: %d\n", ret);

I'd make this a pr_debug to avoid spamming the log. validation can fail intermittently and rescheduling the worker is there to handle it.

With that fixed, the patch is

Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx>


  		goto validate_map_fail;
+	}
/* Update mappings not managed by KFD */
  	list_for_each_entry(peer_vm, &process_info->vm_list_head,



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux