Re: [PATCH] drm/amdkfd: Fix some issues at userptr buffer validation process.

"Chen, Xiaogang" <xiaogang.chen@xxxxxxx> · Thu, 13 Apr 2023 22:27:37 -0500

On 4/13/2023 3:08 PM, Felix Kuehling wrote:
Am 2023-04-12 um 02:14 schrieb Xiaogang.Chen:
From: Xiaogang Chen<xiaogang.chen@xxxxxxx>

Notice userptr buffer restore process has following issues:

1: amdgpu_ttm_tt_get_user_pages can fail(-EFAULT). If it failed we 
should not set
it valid(mem->invalid = 0). In this case mem has no associated hmm 
range or user_pages
associated.

We don't want to suspend the process indefinitely when this happens. 
This can happen if usermode calls munmap before unregistering the 
userptr. What we want to happen in this case is, the process should 
resume. If it accesses the virtual address, it will result in a page 
fault, which alerts the application to its mistake. If it doesn't 
access the virtual address, then there is no harm.

It's a good catch that there is no useful hmm_range in this case to 
check validity, so we should not warn about it in 
confirm_valid_user_pages_locked.

Not sure why you said "suspend the process indefinitely". When 
mem(kgd_mem) has no hmm_range due to amdgpu_ttm_tt_get_user_pages fail 
the patch does not mark it valid( set mem->invalid != 0) and keep it at 
userptr_inval_list. The process has not been suspended.

Yes, in this customer application case amdgpu_ttm_tt_get_user_pages 
failed at vma_lookup. I think it unmap its buffer before unregister from 
KFD. This patch handles amdgpu_ttm_tt_get_user_pages in general: not 
mark it valid(mem->invalid != 0), keep it at userptr_inval_list, then at 
confirm_valid_user_pages_locked, check if mem has hmm range associated 
before WARN.

2: mmu notifier can happen concurrently and update 
mem->range->notifier->invalidate_seq,
but not mem->range->notifier_seq. That causes 
mem->range->notifier_seq stale
when mem is in process_info->userptr_inval_list and 
amdgpu_amdkfd_restore_userptr_worker
got interrupted. At next rescheduled next attempt we use stale 
mem->range->notifier_seq
to compare with mem->range->notifier->invalidate_seq.

amdgpu_hmm_range_get_pages updates mem->range->notifier_seq with the 
current mem->range->notifier->invalidate_seq. If an eviction happens 
after this, there is a collision and the range needs to be 
revalidated. I think when you say "mem->range->notifier_seq is stale", 
it means there was a collision. When this happens, mem->invalid should 
be set to true at the same time. So confirm_valid_user_pages_locked 
should not complain because mem->invalid and 
amdgpu_ttm_tt_get_user_pages_done should agree that the range is invalid.

Yes, "mem->range->notifier_seq is stale" means it is different from 
mem->range->notifier_seq. It is caused by mmu notifier happened 
concurrently on same buffer that is still in restore process. For this 
case the patch update mem->range->notifier_seq:

+            if (mem->range)
+                mem->range->notifier_seq = 
mem->range->notifier->invalidate_seq;

Then restore process will skip confirm_valid_user_pages_locked, jump to 
next scheduled attempt: "goto unlock_notifier_out".

"At next rescheduled next attempt we use stale 
mem->range->notifier_seq": This is not really stale. The notifier_seq 
indicates whether the pages returned by the last call to 
amdgpu_hmm_range_get_pages are still valid. If it's "stale", it means 
an invalidation (evict_userptr) happened and we need to 
amdgpu_hmm_range_get_pages again. In theory, if an invalidation 
happened since the last call, then mem->invalid should also be true. 
So again, the sequence numbers and mem->invalid should agree and there 
should be no warning.

When invalidation (evict_userptr) happen concurrently restore process 
will schedule next attempt. The mem->invalid is set to true by 
evict_userptr, also the sequence numbers. Both agree and with this patch 
we do not see WARN.
The warning messages printed in confirm_valid_user_pages_locked 
indicate that there is a mismatch between the sequence numbers and 
mem->invalid. As I understand it, such a mismatch should be 
impossible. Unless there are some bad assumptions in the code. I 
haven't figured out what those bad assumptions are yet. Other than the 
case for -EFAULT you pointed out above.

From my debugging this customer app the warnings is due to we did not 
handle well if amdgpu_hmm_range_get_pages return -EFAULT and mmu 
notifier happen on same buffer concurrently. That lead we use mem 
without hmm range associated and stale mem->range->notifier_seq for 
following operations. With this patch there is no warning messages and 
not see other errors.

Xiaogang

Regards,
  Felix



Signed-off-by: Xiaogang Chen<Xiaogang.Chen@xxxxxxx>
---
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 45 +++++++++++++++----
  1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7b1f5933ebaa..6881f1b0844c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2444,7 +2444,9 @@ static int update_invalid_user_pages(struct 
amdkfd_process_info *process_info,
              ret = -EAGAIN;
              goto unlock_out;
          }
-        mem->invalid = 0;
+         /* set mem valid if mem has hmm range associated */
+        if (mem->range)
+            mem->invalid = 0;
      }
    unlock_out:
@@ -2576,16 +2578,28 @@ static int 
confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
      list_for_each_entry_safe(mem, tmp_mem,
                   &process_info->userptr_inval_list,
                   validate_list.head) {
-        bool valid = amdgpu_ttm_tt_get_user_pages_done(
-                mem->bo->tbo.ttm, mem->range);
+        /* Only check mem with hmm range associated */
+        bool valid;
  -        mem->range = NULL;
-        if (!valid) {
-            WARN(!mem->invalid, "Invalid BO not marked invalid");
+        if (mem->range) {
+            valid = amdgpu_ttm_tt_get_user_pages_done(
+                    mem->bo->tbo.ttm, mem->range);
+
+            mem->range = NULL;
+            if (!valid) {
+                WARN(!mem->invalid, "Invalid BO not marked invalid");
+                ret = -EAGAIN;
+                continue;
+            }
+        } else
+            /* keep mem without hmm range at userptr_inval_list */
+            continue;
+
+        if (mem->invalid) {
+            WARN(1, "Valid BO is marked invalid");
              ret = -EAGAIN;
              continue;
          }
-        WARN(mem->invalid, "Valid BO is marked invalid");
            list_move_tail(&mem->validate_list.head,
                     &process_info->userptr_valid_list);
@@ -2644,8 +2658,23 @@ static void 
amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
       * reference counting inside KFD will handle this case.
       */
      mutex_lock(&process_info->notifier_lock);
-    if (process_info->evicted_bos != evicted_bos)
+    if (process_info->evicted_bos != evicted_bos) {
+        /* mmu notifier interrupted 
amdgpu_amdkfd_restore_userptr_worker
+         * before reschedule next attempt update stale 
mem->range->notifier_seq
+         * inside userptr_inval_list
+         */
+        struct kgd_mem *mem, *tmp_mem;
+
+        list_for_each_entry_safe(mem, tmp_mem,
+                &process_info->userptr_inval_list,
+                validate_list.head) {
+
+            if (mem->range)
+                mem->range->notifier_seq = 
mem->range->notifier->invalidate_seq;
+        }
+
          goto unlock_notifier_out;
+    }
        if (confirm_valid_user_pages_locked(process_info)) {
          WARN(1, "User pages unexpectedly invalid");