Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 03.06.22 um 03:21 schrieb Bas Nieuwenhuizen:
[SNIP]
The problem is we need to wait on fences *not* added to the buffer object.
What fences wouldn't be added to the buffer object that we need here?

Basically all still running submissions from the VM which could potentially access the BO.

That's why we have the AMDGPU_SYNC_EQ_OWNER in amdgpu_vm_update_range().

E.g. what we currently do here while freeing memory is:
1. Update the PTEs and make that update wait for everything!
2. Add the fence of that update to the freed up BO so that this BO isn't
freed before the next CS.

We might be able to fix this by adding the fences to the BO before
freeing it manually, but I'm not 100% sure we can actually allocate
memory for the fences in that moment.
I think we don't need to be able to. We're already adding the unmap
fence to the BO in the gem close ioctl, and that has the fallback that
if we can't allocate space for the fence in the BO, we wait on the
fence manually on the CPU. I think that is a reasonable fallback for
this as well?

Yes, just blocking might work in an OOM situation as well.

For the TTM move path amdgpu_copy_buffer will wait on the BO resv and
then following submissions will trigger VM updates that will wait on
the amdgpu_copy_buffer jobs (and hence transitively) will wait on the
work.  AFAICT the amdgpu_bo_move does not trigger any VM updates by
itself, and the amdgpu_bo_move_notify is way after the move (and after
the ttm_bo_move_accel_cleanup which would free the old resource), so
any VM changes triggered by that would see the TTM copy and sync to
it.

I do have to fix some stuff indeed, especially for the GEM close but
with that we should be able to keep the same basic approach?

Nope, not even remotely.

What we need is the following:
1. Rolling out my drm_exec patch set, so that we can lock buffers as needed.
2. When we get a VM operation we not only lock the VM page tables, but also all buffers we potentially need to unmap. 3. Nuking the freed list in the amdgpu_vm structure by updating freed areas directly when they are unmapped. 4. Tracking those updates inside the bo_va structure for the BO+VM combination. 5. When the bo_va structure is destroy because of closing the handle move the last clear operation over to the VM as implicit sync.

Only when all this is done we then can resolve the dependency that the CS currently must wait for any clear operation on the VM.

Regards,
Christian.





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux