[Public] ping >-----Original Message----- >From: Yu, Lang <Lang.Yu@xxxxxxx> >Sent: Thursday, April 11, 2024 4:11 PM >To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Cc: Koenig, Christian <Christian.Koenig@xxxxxxx>; Kuehling, Felix ><Felix.Kuehling@xxxxxxx>; Yu, Lang <Lang.Yu@xxxxxxx> >Subject: [PATCH v2] drm/amdkfd: make sure VM is ready for updating >operations > >When page table BOs were evicted but not validated before updating page >tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY >and restore_process_worker runs into a dead loop. > >v2: Split the BO validation and page table update into two separate loops in >amdgpu_amdkfd_restore_process_bos. (Felix) > 1.Validate BOs > 2.Validate VM (and DMABuf attachments) > 3.Update page tables for the BOs validated above > >Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in >compute VMs") > >Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx> >--- > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 34 +++++++++++---- >---- > 1 file changed, 20 insertions(+), 14 deletions(-) > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >index 0ae9fd844623..e2c9e6ddb1d1 100644 >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >@@ -2900,13 +2900,12 @@ int >amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence >__rcu * > > amdgpu_sync_create(&sync_obj); > >- /* Validate BOs and map them to GPUVM (update VM page tables). >*/ >+ /* Validate BOs managed by KFD */ > list_for_each_entry(mem, &process_info->kfd_bo_list, > validate_list) { > > struct amdgpu_bo *bo = mem->bo; > uint32_t domain = mem->domain; >- struct kfd_mem_attachment *attachment; > struct dma_resv_iter cursor; > struct dma_fence *fence; > >@@ -2931,6 +2930,25 @@ int >amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence >__rcu * > goto validate_map_fail; > } > } >+ } >+ >+ if (failed_size) >+ pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size); >+ >+ /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO >+ * validations above would invalidate DMABuf imports again. >+ */ >+ ret = process_validate_vms(process_info, &exec.ticket); >+ if (ret) { >+ pr_debug("Validating VMs failed, ret: %d\n", ret); >+ goto validate_map_fail; >+ } >+ >+ /* Update mappings managed by KFD. */ >+ list_for_each_entry(mem, &process_info->kfd_bo_list, >+ validate_list) { >+ struct kfd_mem_attachment *attachment; >+ > list_for_each_entry(attachment, &mem->attachments, list) { > if (!attachment->is_mapped) > continue; >@@ -2947,18 +2965,6 @@ int >amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence >__rcu * > } > } > >- if (failed_size) >- pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size); >- >- /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO >- * validations above would invalidate DMABuf imports again. >- */ >- ret = process_validate_vms(process_info, &exec.ticket); >- if (ret) { >- pr_debug("Validating VMs failed, ret: %d\n", ret); >- goto validate_map_fail; >- } >- > /* Update mappings not managed by KFD */ > list_for_each_entry(peer_vm, &process_info->vm_list_head, > vm_list_node) { >-- >2.25.1