Re: [PATCH 5/5] drm/amdkfd: svm deferred work pin mm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2021-11-09 6:04 p.m., Philip Yang wrote:
Make sure mm does not remove when prange deferred work insert mmu range
notifier, to avoid WARNING:

WARNING: CPU: 6 PID: 1787 at mm/mmu_notifier.c:932 __mmu_interval_notifier_insert+0xdd/0xf0
Workqueue: events svm_range_deferred_list_work [amdgpu]
RIP: 0010:__mmu_interval_notifier_insert+0xdd/0xf0
Call Trace:
   svm_range_deferred_list_work+0x156/0x320 [amdgpu]
   process_one_work+0x29f/0x5e0
   worker_thread+0x39/0x3e0

Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx>
---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 2083a10b35c2..fddf0a93d6f1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1992,6 +1992,13 @@ static void svm_range_deferred_list_work(struct work_struct *work)
  			 prange->start, prange->last, prange->work_item.op);
mm = prange->work_item.mm;
+		if (!mmget_not_zero(mm)) {

You can only rely on mmget_not_zero if there is an mmgrab-reference still active. Otherwise the mm_struct may already be freed and allocated for something else and the mm_user counter may be corrupted. You're safe until kfd_process_wq_release calls put_task_struct(p->lead_thread) because the task holds a reference to the mm_struct.

One way to guarantee that at least the mm_struct still exists while this worker runs is, to add cancel_work_sync(&p->svms.deferred_list_work) in kfd_process_notifier_release. We should probably do that anyway.

+			pr_debug("skip range %p as mm is gone\n", prange);
+			spin_lock(&svms->deferred_list_lock);
+			list_del_init(&prange->deferred_list);
+			continue;

If the mm is gone, you can probably just break here. All the items in the list should have the same mm.

That makes me wonder why we need the mm pointer in the work_item at all. We could just get an mm reference from get_task_mm(p->lead_thread) (where p is the container of svms). And you can do that outside the loop. You'll still need a matching mmput call.

Regards,
  Felix


+		}
+
  retry:
  		mmap_write_lock(mm);
  		mutex_lock(&svms->lock);
@@ -2032,6 +2039,7 @@ static void svm_range_deferred_list_work(struct work_struct *work)
  		svm_range_handle_list_op(svms, prange);
  		mutex_unlock(&svms->lock);
  		mmap_write_unlock(mm);
+		mmput(mm);
spin_lock(&svms->deferred_list_lock);
  	}



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux