[AMD Official Use Only - General] > -----Original Message----- > From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> > Sent: Monday, March 11, 2024 2:50 AM > To: Joshi, Mukul <Mukul.Joshi@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx> > Subject: Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process > restore > > Caution: This message originated from an External Source. Use proper caution > when opening attachments, clicking links, or responding. > > > Am 08.03.24 um 17:22 schrieb Mukul Joshi: > > In certain situations, some apps can import a BO multiple times > > (through IPC for example). To restore such processes successfully, we > > need to tell drm to ignore duplicate BOs. > > While at it, also add additional logging to prevent silent failures > > when process restore fails. > > > > Signed-off-by: Mukul Joshi <mukul.joshi@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 14 > ++++++++++---- > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > index bf8e6653341f..65d808d8b5da 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > @@ -2869,14 +2869,16 @@ int > > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence > > __rcu * > > > > mutex_lock(&process_info->lock); > > > > - drm_exec_init(&exec, 0); > > + drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES); > > drm_exec_until_all_locked(&exec) { > > list_for_each_entry(peer_vm, &process_info->vm_list_head, > > vm_list_node) { > > ret = amdgpu_vm_lock_pd(peer_vm, &exec, 2); > > drm_exec_retry_on_contention(&exec); > > - if (unlikely(ret)) > > + if (unlikely(ret)) { > > + pr_err("Locking VM PD failed, ret: > > + %d\n", ret); > > goto ttm_reserve_fail; > > + } > > That's a bad idea. Locking can always be interrupted and that would print an > error here. > Thanks Christian. Will send out a patch to change it to pr_debug. Regards, Mukul > Regards, > Christian. > > > } > > > > /* Reserve all BOs and page tables/directory. Add all > > BOs from @@ -2889,8 +2891,10 @@ int > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence > __rcu * > > gobj = &mem->bo->tbo.base; > > ret = drm_exec_prepare_obj(&exec, gobj, 1); > > drm_exec_retry_on_contention(&exec); > > - if (unlikely(ret)) > > + if (unlikely(ret)) { > > + pr_err("drm_exec_prepare_obj failed, > > + ret: %d\n", ret); > > goto ttm_reserve_fail; > > + } > > } > > } > > > > @@ -2950,8 +2954,10 @@ int > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence > __rcu * > > * validations above would invalidate DMABuf imports again. > > */ > > ret = process_validate_vms(process_info, &exec.ticket); > > - if (ret) > > + if (ret) { > > + pr_err("Validating VMs failed, ret: %d\n", ret); > > goto validate_map_fail; > > + } > > > > /* Update mappings not managed by KFD */ > > list_for_each_entry(peer_vm, &process_info->vm_list_head,