RE: [PATCH v4] drm/amdgpu: Fix the race condition for draining retry fault

"Deng, Emily" <Emily.Deng@xxxxxxx> · Fri, 7 Mar 2025 01:27:04 +0000

[AMD Official Use Only - AMD Internal Distribution Only]






 
 



From: Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>


Sent: Friday, March 7, 2025 1:01 AM

To: Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx

Subject: Re: [PATCH v4] drm/amdgpu: Fix the race condition for draining retry fault


 
Thanks for catch up and fix this race condition. It looks good to me. One minor thing below:

On 3/6/2025 12:03 AM, Emily Deng wrote:


Issue:
In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts
 has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu
is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages
continues execution and reaches svm_range_from_addr. This results in
a "failed to find prange..." error, causing the page recovery to fail.
 
How to fix:
Move the timestamp check code under the protection of svm->lock.
 
v2:
Make sure all right locks are released before go out.
 
v3:
Directly goto out_unlock_svms, and return -EAGAIN.
 
v4:
Refine code.
 
Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 30 +++++++++++++++-------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index d04725583f19..83ac14bf7a7a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3008,19 +3008,6 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
                goto out;
        }
 
-       /* check if this page fault time stamp is before svms->checkpoint_ts */
-       if (svms->checkpoint_ts[gpuidx] != 0) {
-               if (amdgpu_ih_ts_after_or_equal(ts,  svms->checkpoint_ts[gpuidx])) {
-                       pr_debug("draining retry fault, drop fault 0x%llx\n", addr);
-                       r = 0;
-                       goto out;
-               } else
-                       /* ts is after svms->checkpoint_ts now, reset svms->checkpoint_ts
-                        * to zero to avoid following ts wrap around give wrong comparing
-                        */
-                svms->checkpoint_ts[gpuidx] = 0;
-       }
-
        if (!p->xnack_enabled) {
                pr_debug("XNACK not enabled for pasid 0x%x\n", pasid);
                r = -EFAULT;
@@ -3040,6 +3027,20 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
        mmap_read_lock(mm);
 retry_write_locked:
        mutex_lock(&svms->lock);
+
+       /* check if this page fault time stamp is before svms->checkpoint_ts */
+       if (svms->checkpoint_ts[gpuidx] != 0) {
+               if (amdgpu_ih_ts_after_or_equal(ts,  svms->checkpoint_ts[gpuidx])) {
+                       pr_debug("draining retry fault, drop fault 0x%llx\n", addr);
+                       r = -EAGAIN;

We drop page fault because it is stale, not mean to handle it again. if return -EAGAIN we do amdgpu_gmc_filter_faults_remove. If after unmap, user map same range again we should treat page fault happened at same range as new one. 
Regards
Xiaogang
Sorry, I didn't quite catch that. So, you think we shouldn't remove the fault from amdgpu_gmc_filter_faults_remove?
 
Emily Deng
Best Wishes

 
 

 
+                       goto out_unlock_svms;
+               } else
+                       /* ts is after svms->checkpoint_ts now, reset svms->checkpoint_ts
+                        * to zero to avoid following ts wrap around give wrong comparing
+                        */
+                svms->checkpoint_ts[gpuidx] = 0;
+       }
+
        prange = svm_range_from_addr(svms, addr, NULL);
        if (!prange) {
                pr_debug("failed to find prange svms 0x%p address [0x%llx]\n",
@@ -3165,7 +3166,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
        mutex_unlock(&svms->lock);
        mmap_read_unlock(mm);
 
-       svm_range_count_fault(node, p, gpuidx);
+       if (r != -EAGAIN)
+               svm_range_count_fault(node, p, gpuidx);
 
        mmput(mm);
 out: