Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops.

Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> · Fri, 3 Jun 2022 12:08:45 +0200

On Fri, Jun 3, 2022 at 10:11 AM Christian König
<christian.koenig@xxxxxxx> wrote:
>
> Am 03.06.22 um 03:21 schrieb Bas Nieuwenhuizen:
> > [SNIP]
> >> The problem is we need to wait on fences *not* added to the buffer object.
> > What fences wouldn't be added to the buffer object that we need here?
>
> Basically all still running submissions from the VM which could
> potentially access the BO.
>
> That's why we have the AMDGPU_SYNC_EQ_OWNER in amdgpu_vm_update_range().
>
> >> E.g. what we currently do here while freeing memory is:
> >> 1. Update the PTEs and make that update wait for everything!
> >> 2. Add the fence of that update to the freed up BO so that this BO isn't
> >> freed before the next CS.
> >>
> >> We might be able to fix this by adding the fences to the BO before
> >> freeing it manually, but I'm not 100% sure we can actually allocate
> >> memory for the fences in that moment.
> > I think we don't need to be able to. We're already adding the unmap
> > fence to the BO in the gem close ioctl, and that has the fallback that
> > if we can't allocate space for the fence in the BO, we wait on the
> > fence manually on the CPU. I think that is a reasonable fallback for
> > this as well?
>
> Yes, just blocking might work in an OOM situation as well.
>
> > For the TTM move path amdgpu_copy_buffer will wait on the BO resv and
> > then following submissions will trigger VM updates that will wait on
> > the amdgpu_copy_buffer jobs (and hence transitively) will wait on the
> > work.  AFAICT the amdgpu_bo_move does not trigger any VM updates by
> > itself, and the amdgpu_bo_move_notify is way after the move (and after
> > the ttm_bo_move_accel_cleanup which would free the old resource), so
> > any VM changes triggered by that would see the TTM copy and sync to
> > it.
> >
> > I do have to fix some stuff indeed, especially for the GEM close but
> > with that we should be able to keep the same basic approach?
>
> Nope, not even remotely.
>
> What we need is the following:
> 1. Rolling out my drm_exec patch set, so that we can lock buffers as needed.
> 2. When we get a VM operation we not only lock the VM page tables, but
> also all buffers we potentially need to unmap.
> 3. Nuking the freed list in the amdgpu_vm structure by updating freed
> areas directly when they are unmapped.
> 4. Tracking those updates inside the bo_va structure for the BO+VM
> combination.
> 5. When the bo_va structure is destroy because of closing the handle
> move the last clear operation over to the VM as implicit sync.
>

Hi Christian, isn't that a different problem though (that we're also
trying to solve, but in your series)?

What this patch tries to achieve:

(t+0) CS submission setting BOOKKEEP fences (i.e. no implicit sync)
(t+1) a VM operation on a BO/VM accessed by the CS.

to run concurrently. What it *doesn't* try is

(t+0) a VM operation on a BO/VM accessed by the CS.
(t+1) CS submission setting BOOKKEEP fences (i.e. no implicit sync)

to run concurrently. When you write

> Only when all this is done we then can resolve the dependency that the
> CS currently must wait for any clear operation on the VM.

isn't that all about the second problem?

>
> Regards,
> Christian.
>
>