[AMD Official Use Only - AMD Internal Distribution Only] Hi Felix, You are right, it is easily to hit deadlock, don't know why LOCKDEP doesn't catch this. Need to find another solution. Hi Philip, Do you have a solution for this delay free pt? Emily Deng Best Wishes >-----Original Message----- >From: Deng, Emily <Emily.Deng@xxxxxxx> >Sent: Tuesday, January 7, 2025 10:34 AM >To: Deng, Emily <Emily.Deng@xxxxxxx>; Kuehling, Felix ><Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Yang, Philip ><Philip.Yang@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx> >Subject: RE: [PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages >issue > >[AMD Official Use Only - AMD Internal Distribution Only] > >Ping.... >How about this? Currently it is easily to reproduce the issue on our environment. We >need this change to fix it. > >Emily Deng >Best Wishes > > > >>-----Original Message----- >>From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of >>Deng, Emily >>Sent: Monday, January 6, 2025 9:52 AM >>To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; >>amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Yang, Philip <Philip.Yang@xxxxxxx>; >>Koenig, Christian <Christian.Koenig@xxxxxxx> >>Subject: RE: [PATCH v2] drm/amdgpu: Fix the looply call >>svm_range_restore_pages issue >> >>[AMD Official Use Only - AMD Internal Distribution Only] >> >>[AMD Official Use Only - AMD Internal Distribution Only] >> >>>-----Original Message----- >>>From: Kuehling, Felix <Felix.Kuehling@xxxxxxx> >>>Sent: Saturday, January 4, 2025 7:18 AM >>>To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; >>>Yang, Philip <Philip.Yang@xxxxxxx>; Koenig, Christian >>><Christian.Koenig@xxxxxxx> >>>Subject: Re: [PATCH v2] drm/amdgpu: Fix the looply call >>>svm_range_restore_pages issue >>> >>> >>>On 2025-01-02 21:26, Emily Deng wrote: >>>> As the delayed free pt, the wanted freed bo has been reused which >>>> will cause unexpected page fault, and then call svm_range_restore_pages. >>>> >>>> Detail as below: >>>> 1.It wants to free the pt in follow code, but it is not freed >>>> immediately and used “schedule_work(&vm->pt_free_work);”. >>>> >>>> [ 92.276838] Call Trace: >>>> [ 92.276841] dump_stack+0x63/0xa0 >>>> [ 92.276887] amdgpu_vm_pt_free_list+0xfb/0x120 [amdgpu] >>>> [ 92.276932] amdgpu_vm_update_range+0x69c/0x8e0 [amdgpu] >>>> [ 92.276990] svm_range_unmap_from_gpus+0x112/0x310 [amdgpu] >>>> [ 92.277046] svm_range_cpu_invalidate_pagetables+0x725/0x780 [amdgpu] >>>> [ 92.277050] ? __alloc_pages_nodemask+0x19f/0x3e0 >>>> [ 92.277051] mn_itree_invalidate+0x72/0xc0 >>>> [ 92.277052] __mmu_notifier_invalidate_range_start+0x48/0x60 >>>> [ 92.277054] migrate_vma_collect+0xf6/0x100 >>>> [ 92.277055] migrate_vma_setup+0xcf/0x120 >>>> [ 92.277109] svm_migrate_ram_to_vram+0x256/0x6b0 [amdgpu] >>>> >>>> 2.Call svm_range_map_to_gpu->amdgpu_vm_update_range to update the >>>> page table, at this time it will use the same entry bo which is the >>>> want free bo in step1. >>>> >>>> 3.Then it executes the pt_free_work to free the bo. At this time, >>>> the GPU access memory will cause page fault as pt bo has been freed. >>>> And then it will call svm_range_restore_pages again. >>>> >>>> How to fix? >>>> Add a workqueue, and flush the workqueue each time before updating page >table. >>> >>>I think this is kind of a known issue in the GPUVM code. Philip was >>>looking at it before. >>> >>>Just flushing a workqueue may seem like a simple and elegant solution, >>>but I'm afraid it introduces lock dependency issues. By flushing the >>>workqueue, you're effectively creating a lock dependency of the MMU >>>notifier on any locks held inside the worker function. You now get a >>>circular lock dependency with any of those locks and memory reclaim. I >>>think LOCKDEP would be able to catch that if you compile your kernel >>>with that >>feature enabled. >>> >>>The proper solution is to prevent delayed freeing of page tables if >>>they happened to get reused, or prevent reuse of page tables if they >>>are flagged for >>delayed freeing. >>> >>>Regards, >>> Felix >>> >>Thanks, already enabled LOCKDEP while compiling the kernel. The delay >>work seems for other reasons, I am not sure whether it could be deleted completely. >> >>Emily Deng >>Best Wishes >>> >>>> >>>> Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 1 + >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++-- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 +++++- >>>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 +++ >>>> 5 files changed, 15 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>> index 93c352b08969..cbf68ad1c8d0 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>>> @@ -1188,6 +1188,7 @@ struct amdgpu_device { >>>> struct mutex enforce_isolation_mutex; >>>> >>>> struct amdgpu_init_level *init_lvl; >>>> + struct workqueue_struct *wq; >>>> }; >>>> >>>> static inline uint32_t amdgpu_ip_version(const struct >>>> amdgpu_device *adev, diff --git >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> index f30548f4c3b3..5b4835bc81b3 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> @@ -2069,6 +2069,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( >>>> if (ret) >>>> goto out; >>>> } >>>> + flush_workqueue(adev->wq); >>>> >>>> ret = reserve_bo_and_vm(mem, avm, &ctx); >>>> if (unlikely(ret)) >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> index 9d6ffe38b48a..500d97cd9114 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>> @@ -2607,7 +2607,7 @@ void amdgpu_vm_fini(struct amdgpu_device >>>> *adev, >>>struct amdgpu_vm *vm) >>>> amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm); >>>> >>>> flush_work(&vm->pt_free_work); >>>> - >>>> + cancel_work_sync(&vm->pt_free_work); >>>> root = amdgpu_bo_ref(vm->root.bo); >>>> amdgpu_bo_reserve(root, true); >>>> amdgpu_vm_put_task_info(vm->task_info); >>>> @@ -2708,6 +2708,8 @@ void amdgpu_vm_manager_init(struct >>>> amdgpu_device >>>*adev) >>>> #endif >>>> >>>> xa_init_flags(&adev->vm_manager.pasids, XA_FLAGS_LOCK_IRQ); >>>> + adev->wq = alloc_workqueue("amdgpu_recycle", >>>> + WQ_MEM_RECLAIM | WQ_HIGHPRI | >>>WQ_UNBOUND, 16); >>>> } >>>> >>>> /** >>>> @@ -2721,7 +2723,8 @@ void amdgpu_vm_manager_fini(struct >>>> amdgpu_device >>>*adev) >>>> { >>>> WARN_ON(!xa_empty(&adev->vm_manager.pasids)); >>>> xa_destroy(&adev->vm_manager.pasids); >>>> - >>>> + flush_workqueue(adev->wq); >>>> + destroy_workqueue(adev->wq); >>>> amdgpu_vmid_mgr_fini(adev); >>>> } >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c >>>> index f78a0434a48f..1204406215ee 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c >>>> @@ -554,15 +554,19 @@ void amdgpu_vm_pt_free_work(struct work_struct >>>> *work) >>>> >>>> vm = container_of(work, struct amdgpu_vm, pt_free_work); >>>> >>>> + printk("Emily:%s\n", __func__); >>>> spin_lock(&vm->status_lock); >>>> list_splice_init(&vm->pt_freed, &pt_freed); >>>> spin_unlock(&vm->status_lock); >>>> + printk("Emily:%s 1\n", __func__); >>>> >>>> /* flush_work in amdgpu_vm_fini ensure vm->root.bo is valid. */ >>>> amdgpu_bo_reserve(vm->root.bo, true); >>>> + printk("Emily:%s 2\n", __func__); >>>> >>>> list_for_each_entry_safe(entry, next, &pt_freed, vm_status) >>>> amdgpu_vm_pt_free(entry); >>>> + printk("Emily:%s 3\n", __func__); >>>> >>>> amdgpu_bo_unreserve(vm->root.bo); >>>> } >>>> @@ -589,7 +593,7 @@ void amdgpu_vm_pt_free_list(struct amdgpu_device >>>*adev, >>>> spin_lock(&vm->status_lock); >>>> list_splice_init(¶ms->tlb_flush_waitlist, &vm->pt_freed); >>>> spin_unlock(&vm->status_lock); >>>> - schedule_work(&vm->pt_free_work); >>>> + queue_work(adev->wq, &vm->pt_free_work); >>>> return; >>>> } >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c >>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c >>>> index 3e2911895c74..55edf96d5a95 100644 >>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c >>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c >>>> @@ -1314,6 +1314,7 @@ svm_range_unmap_from_gpu(struct amdgpu_device >>>*adev, struct amdgpu_vm *vm, >>>> uint64_t init_pte_value = 0; >>>> >>>> pr_debug("[0x%llx 0x%llx]\n", start, last); >>>> + flush_workqueue(adev->wq); >>>> >>>> return amdgpu_vm_update_range(adev, vm, false, true, true, >>>> false, NULL, >>>start, >>>> last, init_pte_value, 0, 0, NULL, >>>> NULL, @@ -1422,6 >>>+1423,8 >>>> @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct >>>> svm_range >>>*prange, >>>> * different memory partition based on fpfn/lpfn, we should use >>>> * same vm_manager.vram_base_offset regardless memory partition. >>>> */ >>>> + flush_workqueue(adev->wq); >>>> + >>>> r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, true, >>>> NULL, last_start, prange->start + i, >>>> pte_flags,