Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process restore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-03-11 11:25, Joshi, Mukul wrote:
[AMD Official Use Only - General]

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: Monday, March 11, 2024 2:50 AM
To: Joshi, Mukul <Mukul.Joshi@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process
restore

Caution: This message originated from an External Source. Use proper caution
when opening attachments, clicking links, or responding.


Am 08.03.24 um 17:22 schrieb Mukul Joshi:
In certain situations, some apps can import a BO multiple times
(through IPC for example). To restore such processes successfully, we
need to tell drm to ignore duplicate BOs.
While at it, also add additional logging to prevent silent failures
when process restore fails.

Signed-off-by: Mukul Joshi <mukul.joshi@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 14
++++++++++----
   1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index bf8e6653341f..65d808d8b5da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2869,14 +2869,16 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *

       mutex_lock(&process_info->lock);

-     drm_exec_init(&exec, 0);
+     drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES);
       drm_exec_until_all_locked(&exec) {
               list_for_each_entry(peer_vm, &process_info->vm_list_head,
                                   vm_list_node) {
                       ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2);
                       drm_exec_retry_on_contention(&exec);
-                     if (unlikely(ret))
+                     if (unlikely(ret)) {
+                             pr_err("Locking VM PD failed, ret:
+ %d\n", ret);
                               goto ttm_reserve_fail;
+                     }
That's a bad idea. Locking can always be interrupted and that would print an
error here.

Thanks Christian. Will send out a patch to change it to pr_debug.

We cannot get interrupted here because we're in a worker thread. We should be running in non-interruptible mode.

Regards,
  Felix



Regards,
Mukul

Regards,
Christian.

               }

               /* Reserve all BOs and page tables/directory. Add all
BOs from @@ -2889,8 +2891,10 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *
                       gobj = &mem->bo->tbo.base;
                       ret = drm_exec_prepare_obj(&exec, gobj, 1);
                       drm_exec_retry_on_contention(&exec);
-                     if (unlikely(ret))
+                     if (unlikely(ret)) {
+                             pr_err("drm_exec_prepare_obj failed,
+ ret: %d\n", ret);
                               goto ttm_reserve_fail;
+                     }
               }
       }

@@ -2950,8 +2954,10 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *
        * validations above would invalidate DMABuf imports again.
        */
       ret = process_validate_vms(process_info, &exec.ticket);
-     if (ret)
+     if (ret) {
+             pr_err("Validating VMs failed, ret: %d\n", ret);
               goto validate_map_fail;
+     }

       /* Update mappings not managed by KFD */
       list_for_each_entry(peer_vm, &process_info->vm_list_head,



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux