On Fri, Jun 3, 2022 at 10:11 AM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 03.06.22 um 03:21 schrieb Bas Nieuwenhuizen: > > [SNIP] > >> The problem is we need to wait on fences *not* added to the buffer object. > > What fences wouldn't be added to the buffer object that we need here? > > Basically all still running submissions from the VM which could > potentially access the BO. > > That's why we have the AMDGPU_SYNC_EQ_OWNER in amdgpu_vm_update_range(). > > >> E.g. what we currently do here while freeing memory is: > >> 1. Update the PTEs and make that update wait for everything! > >> 2. Add the fence of that update to the freed up BO so that this BO isn't > >> freed before the next CS. > >> > >> We might be able to fix this by adding the fences to the BO before > >> freeing it manually, but I'm not 100% sure we can actually allocate > >> memory for the fences in that moment. > > I think we don't need to be able to. We're already adding the unmap > > fence to the BO in the gem close ioctl, and that has the fallback that > > if we can't allocate space for the fence in the BO, we wait on the > > fence manually on the CPU. I think that is a reasonable fallback for > > this as well? > > Yes, just blocking might work in an OOM situation as well. > > > For the TTM move path amdgpu_copy_buffer will wait on the BO resv and > > then following submissions will trigger VM updates that will wait on > > the amdgpu_copy_buffer jobs (and hence transitively) will wait on the > > work. AFAICT the amdgpu_bo_move does not trigger any VM updates by > > itself, and the amdgpu_bo_move_notify is way after the move (and after > > the ttm_bo_move_accel_cleanup which would free the old resource), so > > any VM changes triggered by that would see the TTM copy and sync to > > it. > > > > I do have to fix some stuff indeed, especially for the GEM close but > > with that we should be able to keep the same basic approach? > > Nope, not even remotely. > > What we need is the following: > 1. Rolling out my drm_exec patch set, so that we can lock buffers as needed. > 2. When we get a VM operation we not only lock the VM page tables, but > also all buffers we potentially need to unmap. > 3. Nuking the freed list in the amdgpu_vm structure by updating freed > areas directly when they are unmapped. > 4. Tracking those updates inside the bo_va structure for the BO+VM > combination. > 5. When the bo_va structure is destroy because of closing the handle > move the last clear operation over to the VM as implicit sync. > Hi Christian, isn't that a different problem though (that we're also trying to solve, but in your series)? What this patch tries to achieve: (t+0) CS submission setting BOOKKEEP fences (i.e. no implicit sync) (t+1) a VM operation on a BO/VM accessed by the CS. to run concurrently. What it *doesn't* try is (t+0) a VM operation on a BO/VM accessed by the CS. (t+1) CS submission setting BOOKKEEP fences (i.e. no implicit sync) to run concurrently. When you write > Only when all this is done we then can resolve the dependency that the > CS currently must wait for any clear operation on the VM. isn't that all about the second problem? > > Regards, > Christian. > >