Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops.

Christian König <christian.koenig@xxxxxxx> · Wed, 1 Jun 2022 11:01:41 +0200

Am 01.06.22 um 10:48 schrieb Bas Nieuwenhuizen:
On Wed, Jun 1, 2022 at 10:40 AM Christian König
<christian.koenig@xxxxxxx> wrote:
Am 01.06.22 um 10:16 schrieb Bas Nieuwenhuizen:
On Wed, Jun 1, 2022 at 10:03 AM Christian König
<christian.koenig@xxxxxxx> wrote:
Am 01.06.22 um 02:40 schrieb Bas Nieuwenhuizen:
This should be okay because moves themselves use KERNEL usage and
hence still sync with BOOKKEEP usage. Then any later submits still
wait on any pending VM operations.

(i.e. we only made VM ops not wait on BOOKKEEP submits, not the other
    way around)
Well NAK again. This allows access to freed up memory and is a complete
no-go.
How does this allow access to freed memory? Worst I can see is that
the unmap happens earlier if the app/drivers gets the waits wrong,
which wouldn't give access after the underlying BO is freed?
To free up memory we need to update the PTEs and then flush those out by
invalidating the TLB.

On gfx6, gfx7 and gfx8 and some broken gfx10 hw invalidating the TLB can
only be done while the VMID is idle.

Only gfx9 can reliable invalidate the TLB while it is in use and even
there it comes with quite some performance penalty (at TLB invalidation
can take multiple seconds).

Because of this what we do in the kernel driver is to sync to everything
when we unmap entries:

          if (!(flags & AMDGPU_PTE_VALID))
                  sync_mode = AMDGPU_SYNC_EQ_OWNER;
          else
                  sync_mode = AMDGPU_SYNC_EXPLICIT;

This acts as a barrier for freeing the memory. In other words we
intentionally add a bubble which syncs everything.

I'm working for month on a concept how to do all this without causing
the stalls you observer, but so far didn't came to much of a conclusion.
That might cause an unmap operation too early, but for freeing up the
actual backing memory we still wait for all fences on the BO to finish
first, no? In that case, since BOOKKEEP fences are still added for
explicit sync, that should not be a problem, no?

(If not, that sounds like the obvious fix for making this work?)

The problem is we need to wait on fences *not* added to the buffer object.

E.g. what we currently do here while freeing memory is:
1. Update the PTEs and make that update wait for everything!
2. Add the fence of that update to the freed up BO so that this BO isn't 
freed before the next CS.

We might be able to fix this by adding the fences to the BO before 
freeing it manually, but I'm not 100% sure we can actually allocate 
memory for the fences in that moment.

Regards,
Christian.


Regards,
Christian.

Regards,
Christian.

Signed-off-by: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  | 2 +-
    drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 2 +-
    2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
index f10332e1c6c0..31bc73fd1fae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
@@ -51,7 +51,7 @@ static int amdgpu_vm_cpu_prepare(struct amdgpu_vm_update_params *p,
        if (!resv)
                return 0;

-     return amdgpu_bo_sync_wait_resv(p->adev, resv, sync_mode, sync_mode, p->vm, true);
+     return amdgpu_bo_sync_wait_resv(p->adev, resv, sync_mode, AMDGPU_SYNC_EXPLICIT, p->vm, true);
    }

    /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 63b484dc76c5..c8d5898bea11 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -75,7 +75,7 @@ static int amdgpu_vm_sdma_prepare(struct amdgpu_vm_update_params *p,
        if (!resv)
                return 0;

-     return amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, sync_mode, p->vm);
+     return amdgpu_sync_resv(p->adev, &p->job->sync, resv, sync_mode, AMDGPU_SYNC_EXPLICIT, p->vm);
    }

    /**