Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

Felix Kuehling <felix.kuehling@xxxxxxx> · Wed, 17 Jul 2024 12:27:52 -0400

On 2024-07-15 08:39, Christian König wrote:
Hi Felix,

yes that is a perfectly expected consequence.

The last time we talked about it the problem to solve this was that 
amdgpu_vm_sdma_prepare() couldn't read the fences from a resv object 
which wasn't locked.
Why only amdgpu_vm_sdma_prepare? Doesn't CPU page table update have the 
same problem?

That happens both during amdgpu_vm_handle_moved() as well as unlocked 
in validations of the page tables.
By "unlocked validations of page table entries" do you mean the 
"unlocked" flag in amdgpu_vm_update_range? That should only be used for 
invalidating page table entries in MMU notifiers for SVM ranges. It 
should not affect normal BOs.
amdgpu_vm_handle_moved tries to lock the reservations. But if it fails, 
it clears page table entries. So this is another case of "unlocked 
invalidations". This one does affect normal BOs. I think 
amdgpu_vm_handle_moved makes an assumption that the other user of the BO 
is in a different VM but not the same VM. Clearing the PTEs in this VM 
even though the BO move is still waiting for some other VM to finish 
accessing it, is safe.
I think here we have a case where the BO is used by something else in 
the same VM. In this case we cannot safely clear the PTEs before the BO 
fence signals.
We want to clear the PTEs before the move happens. Otherwise we risk 
memory corruption. Maybe the same job that does the move blit should 
also invalidate the PTEs?
Regards,
  Felix


IIRC we postponed looking into the issue until it really becomes a 
problem which is probably now :)
Regards,
Christian.

Am 12.07.24 um 16:56 schrieb Felix Kuehling:
KFD eviction fences are triggered by the enable_signaling callback on 
the eviction fence. Any move operations scheduled by amdgpu_bo_move 
are held up by the GPU scheduler until the eviction fence is signaled 
by the KFD eviction handler, which only happens after the user mode 
queues are stopped.
As I understand it, VM BO invalidation does not unmap anything from 
the page table itself. So the KFD queues are OK continue running 
until the eviction handler stops them and signals the fence.
However, if amdgpu_vm_handle_moved gets called before the eviction 
fence is signaled, then there could be a problem. In applications 
that do compute-graphics interop, the VM is shared between compute 
and graphics. So graphics and compute submissions at the same time 
are possible. @Christian, this is a concequence of using libdrm and 
insisting that each process uses only a single VM per GPU.
Regards,
   Felix

On 2024-07-12 3:39, Christian König wrote:
Hi River,

well that isn't an error at all, this is perfectly expected behavior.

The VMs used by the KFD process are currently not meant to be used 
by classic CS at the same time.
This is one of the reasons for that.

Regards,
Christian.

Am 12.07.24 um 09:35 schrieb YuanShang Mao (River):
[AMD Official Use Only - AMD Internal Distribution Only]

Add more info and CC @Kuehling, Felix @cao, lin

In amdgpu_amdkfd_fence.c, there is a design description:

/* Eviction Fence
   * Fence helper functions to deal with KFD memory eviction.
   * Big Idea - Since KFD submissions are done by user queues, a BO 
cannot be
   *  evicted unless all the user queues for that process are evicted.
   *
   * All the BOs in a process share an eviction fence. When process 
X wants
   * to map VRAM memory but TTM can't find enough space, TTM will 
attempt to
   * evict BOs from its LRU list. TTM checks if the BO is valuable 
to evict
   * by calling ttm_device_funcs->eviction_valuable().
   *
   * ttm_device_funcs->eviction_valuable() - will return false if 
the BO belongs
   *  to process X. Otherwise, it will return true to indicate BO 
can be
   *  evicted by TTM.
   *
   * If ttm_device_funcs->eviction_valuable returns true, then TTM 
will continue
   * the evcition process for that BO by calling ttm_bo_evict --> 
amdgpu_bo_move
   * --> amdgpu_copy_buffer(). This sets up job in GPU scheduler.
   *
   * GPU Scheduler (amd_sched_main) - sets up a cb 
(fence_add_callback) to
   *  nofity when the BO is free to move. fence_add_callback --> 
enable_signaling
   *  --> amdgpu_amdkfd_fence.enable_signaling
   *
   * amdgpu_amdkfd_fence.enable_signaling - Start a work item that 
will quiesce
   * user queues and signal fence. The work item will also start 
another delayed
   * work item to restore BOs
   */

If mark BOs as invalidated before submitting job to move the 
buffer, user queue is still active.
During the time before user queue is evicted, if a drm job achieve, 
amdgpu_cs_vm_handling will call amdgpu_vm_handle_moved to clear the 
ptes of
Invalidated BOs. Then page fault happens because compute shader is 
still accessing the "invalidated" BO.
I am not familiar with amdgpu_vm_bo state machine, so I don’t know 
if it is an code error or an design error.
Thanks
River


-----Original Message-----
From: YuanShang Mao (River)
Sent: Friday, July 12, 2024 10:55 AM
To: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Cc: Huang, Trigger <Trigger.Huang@xxxxxxx>; 
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: RE: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

We need to make sure that all BOs of an active kfd process 
validated. Moving buffer will trigger process eviction.
If mark it as invalided before process eviction, related kfd 
process is still active and may attempt to access this invalidated BO.
Agree with Trigger. Seems kfd eviction should been synced to move 
notify, not the move action.
Thanks
River

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: Thursday, July 11, 2024 8:39 PM
To: Huang, Trigger <Trigger.Huang@xxxxxxx>; YuanShang Mao (River) 
<YuanShang.Mao@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

Yeah, completely agree. This patch doesn't really make sense.

Please explain why you would want to do this?

Regards,
Christian.

Am 11.07.24 um 13:56 schrieb Huang, Trigger:
[AMD Official Use Only - AMD Internal Distribution Only]

This patch seems to be wrong.
Quite a lot of preparations have been done in amdgpu_bo_move_notify
For example, amdgpu_bo_kunmap() will be called to prevent the BO 
from being accessed by CPU. If not called, the CPU may attempt to 
access the BO while it is being moved.
Thanks,
Trigger

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
YuanShang
Sent: Thursday, July 11, 2024 5:10 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: YuanShang Mao (River) <YuanShang.Mao@xxxxxxx>; YuanShang Mao
(River) <YuanShang.Mao@xxxxxxx>
Subject: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

Caution: This message originated from an External Source. Use proper
caution when opening attachments, clicking links, or responding.


It leads to race condition if amdgpu_bo is marked as invalid before
it is really moved.

Signed-off-by: YuanShang <YuanShang.Mao@xxxxxxx>
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 10 +++++-----
    1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 29e4b5875872..a29d5132ad3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -519,8 +519,8 @@ static int amdgpu_bo_move(struct
ttm_buffer_object *bo, bool evict,

           if (!old_mem || (old_mem->mem_type == TTM_PL_SYSTEM &&
                            bo->ttm == NULL)) {
-               amdgpu_bo_move_notify(bo, evict, new_mem);
                   ttm_bo_move_null(bo, new_mem);
+               amdgpu_bo_move_notify(bo, evict, new_mem);
                   return 0;
           }
           if (old_mem->mem_type == AMDGPU_GEM_DOMAIN_DGMA || @@ -
530,8 +530,8 @@ static int amdgpu_bo_move(struct ttm_buffer_object
*bo, bool evict,
           if (old_mem->mem_type == TTM_PL_SYSTEM &&
               (new_mem->mem_type == TTM_PL_TT ||
                new_mem->mem_type == AMDGPU_PL_PREEMPT)) {
-               amdgpu_bo_move_notify(bo, evict, new_mem);
                   ttm_bo_move_null(bo, new_mem);
+               amdgpu_bo_move_notify(bo, evict, new_mem);
                   return 0;
           }
           if ((old_mem->mem_type == TTM_PL_TT || @@ -542,9 
+542,9 @@
static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict,
                           return r;

amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm);
-               amdgpu_bo_move_notify(bo, evict, new_mem);
                   ttm_resource_free(bo, &bo->resource);
                   ttm_bo_assign_mem(bo, new_mem);
+               amdgpu_bo_move_notify(bo, evict, new_mem);
                   return 0;
           }

@@ -557,8 +557,8 @@ static int amdgpu_bo_move(struct
ttm_buffer_object *bo, bool evict,
               new_mem->mem_type == AMDGPU_PL_OA ||
               new_mem->mem_type == AMDGPU_PL_DOORBELL) {
                   /* Nothing to save here */
-               amdgpu_bo_move_notify(bo, evict, new_mem);
                   ttm_bo_move_null(bo, new_mem);
+               amdgpu_bo_move_notify(bo, evict, new_mem);
                   return 0;
           }

@@ -583,11 +583,11 @@ static int amdgpu_bo_move(struct
ttm_buffer_object *bo, bool evict,
                   return -EMULTIHOP;
           }

-       amdgpu_bo_move_notify(bo, evict, new_mem);
           if (adev->mman.buffer_funcs_enabled)
                   r = amdgpu_move_blit(bo, evict, new_mem, 
old_mem);
           else
                   r = -ENODEV;
+       amdgpu_bo_move_notify(bo, evict, new_mem);

           if (r) {
                   /* Check that all memory is CPU accessible */
--
2.25.1