Re: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2025-02-12 23:33, Deng, Emily wrote:

[AMD Official Use Only - AMD Internal Distribution Only]


 

 

From: Yang, Philip <Philip.Yang@xxxxxxx>
Sent: Wednesday, February 12, 2025 10:31 PM
To: Deng, Emily <Emily.Deng@xxxxxxx>; Yang, Philip <Philip.Yang@xxxxxxx>; Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work

 

 

On 2025-02-12 03:54, Deng, Emily wrote:

[AMD Official Use Only - AMD Internal Distribution Only]

 

Ping……

 

Emily Deng

Best Wishes

From: Deng, Emily <Emily.Deng@xxxxxxx>
Sent: Tuesday, February 11, 2025 8:21 PM
To: Deng, Emily <Emily.Deng@xxxxxxx>; Yang, Philip <Philip.Yang@xxxxxxx>; Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: RE: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work

 

[AMD Official Use Only - AMD Internal Distribution Only]

 

Hi Philip,

     Upon further consideration, removing amdgpu_amdkfd_unreserve_mem_limit is challenging because it is paired with amdgpu_amdkfd_reserve_mem_limit in svm_migrate_ram_to_vram. However, this pairing does introduce issues, as it prevents amdgpu_amdkfd_reserve_mem_limit from accurately detecting out-of-memory conditions. Ideally, amdgpu_amdkfd_unreserve_mem_limit should be tied to the actual freeing of memory. Furthermore, since ttm_bo_delayed_delete delays the call to amdgpu_vram_mgr_del, there remains a possibility that amdgpu_amdkfd_reserve_mem_limit reports sufficient memory, while a subsequent call to amdgpu_vram_mgr_new fails. For these reasons, I believe this patch is still necessary.

 

Emily Deng

Best Wishes

From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Deng, Emily
Sent: Tuesday, February 11, 2025 6:56 PM
To: Yang, Philip <Philip.Yang@xxxxxxx>; Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: RE: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work

 

[AMD Official Use Only - AMD Internal Distribution Only]

 

[AMD Official Use Only - AMD Internal Distribution Only]

 

 

 

From: Yang, Philip <Philip.Yang@xxxxxxx>
Sent: Tuesday, February 11, 2025 6:54 AM
To: Deng, Emily <Emily.Deng@xxxxxxx>; Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work

 

 

On 2025-02-10 02:51, Deng, Emily wrote:

[AMD Official Use Only - AMD Internal Distribution Only]

 

[AMD Official Use Only - AMD Internal Distribution Only]

 

 

 

From: Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>
Sent: Monday, February 10, 2025 10:18 AM
To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work

 

 

On 2/7/2025 9:02 PM, Deng, Emily wrote:

[AMD Official Use Only - AMD Internal Distribution Only]
 
[AMD Official Use Only - AMD Internal Distribution Only]
 
Ping.......
 
Emily Deng
Best Wishes
 
 
 
-----Original Message-----
From: Emily Deng <Emily.Deng@xxxxxxx>
Sent: Friday, February 7, 2025 6:28 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Deng, Emily <Emily.Deng@xxxxxxx>
Subject: [PATCH] drm/amdkfd: Fix the deadlock in svm_range_restore_work
 
It will hit deadlock in svm_range_restore_work ramdonly.
Detail as below:
1.svm_range_restore_work
      ->svm_range_list_lock_and_flush_work
      ->mmap_write_lock
2.svm_range_restore_work
      ->svm_range_validate_and_map
      ->amdgpu_vm_update_range
      ->amdgpu_vm_ptes_update
      ->amdgpu_vm_pt_alloc
      ->svm_range_evict_svm_bo_worker

svm_range_evict_svm_bo_worker is a function running by a kernel task from default system_wq. It is not the task that runs svm_range_restore_work which is from system_freezable_wq. The second task may need wait the first task to release mmap_write_lock, but there is no cycle lock dependency.

Can you explain more how deadlock happened? If a deadlock exists between two tasks there are should be at least two locks used by both tasks.

Regards

Xiaogang

In Step 2, during the amdgpu_vm_pt_alloc process, the system encounters insufficient memory and triggers an eviction. This initiates the svm_range_evict_svm_bo_worker task, and waits for the eviction_fence to be signaled. However, the svm_range_evict_svm_bo_worker cannot acquire the mmap_read_lock(mm), preventing it from signaling the eviction_fence. As a result, amdgpu_vm_pt_alloc remains incomplete and cannot release the mmap_write_lock(mm).

Which means the svm_range_restore_work task holds the mmap_write_lock(mm) and is stuck waiting for the eviction_fence to be signaled by svm_range_evict_svm_bo_worker. However, svm_range_evict_svm_bo_worker is itself blocked, unable to acquire the mmap_read_lock(mm). This creates a deadlock.

The deadlock situation should not happen as svm_range_restore_work is only used for xnack off case, there is no VRAM over commitment with KFD amdgpu_amdkfd_reserve_mem_limit. We reserved VRAM ESTIMATE_PT_SIZE for page table allocation to prevent this situation.

Regards,

Philip

Hi Philip,

     You're correct. Upon further investigation, the issue arises from the additional call to amdgpu_amdkfd_unreserve_mem_limit in svm_migrate_ram_to_vram, which prevents amdgpu_amdkfd_reserve_mem_limit from detecting the out-of-memory condition. I will submit another patch to remove the amdgpu_amdkfd_unreserve_mem_limit call in svm_migrate_ram_to_vram.

 

We check all SVM memory must fit in system memory, don't account svm VRAM usage. For xnack off, application should check available VRAM size and avoid VRAM over commitment.

svm_range_restore_worker ensure all SVM ranges are mapped to GPUs then resume queues, this is done by taking mmap write lock and flush deferred_range_list. downgrade to mmap read lock cannot prevent unmap from CPU as mmu notifier callback can add range to deferred_range_list again and unmap from GPUs, so this patch can not work.

Maybe I understand wrong. but downgrading to a read lock could also prevent svm_range_deferred_list_work from acquiring a write lock. As a result, it could potentially block unmapping operations from GPUs.

no, svm_range_cpu_invalidate_pagetables takes prange lock to split prange, and add to deferred_list if needed, then unmap from GPU and return.

This needs app fix, not over commitment, prefetch svm ranges to VRAM if xnack is off.

Regards,

Philip

 

Emily Deng

Best Wishes

 

 

We should not use mmap write lock to sync with mmu notifier, there is a plan to rework svm locks to fix this.

Regards,

Philip

Emily Deng

Best Wishes

 

 

 

Emily Deng

Best Wishes

 

 

      ->mmap_read_lock(deadlock here, because already get mmap_write_lock)
 
How to fix?
Downgrade the write lock to read lock.
 
Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index bd3e20d981e0..c907e2de3dde 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1841,6 +1841,7 @@ static void svm_range_restore_work(struct work_struct
*work)
      mutex_lock(&process_info->lock);
      svm_range_list_lock_and_flush_work(svms, mm);
      mutex_lock(&svms->lock);
+      mmap_write_downgrade(mm);
 
      evicted_ranges = atomic_read(&svms->evicted_ranges);
 
@@ -1890,7 +1891,7 @@ static void svm_range_restore_work(struct work_struct
*work)
 
out_reschedule:
      mutex_unlock(&svms->lock);
-      mmap_write_unlock(mm);
+      mmap_read_unlock(mm);
      mutex_unlock(&process_info->lock);
 
      /* If validation failed, reschedule another attempt */
--
2.34.1
 

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux