Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops.

Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> · Fri, 17 Jun 2022 15:03:52 +0200

On Wed, Jun 15, 2022 at 9:00 AM Christian König
<christian.koenig@xxxxxxx> wrote:
>
> Am 06.06.22 um 13:00 schrieb Bas Nieuwenhuizen:
> > On Mon, Jun 6, 2022 at 12:35 PM Christian König
> > <christian.koenig@xxxxxxx> wrote:
> >> [SNIP]
> >> That part won't work at all and would cause additional synchronization
> >> problems.
> >>
> >> First of all for implicit synced CS we should use READ, not BOOKKEEP.
> >> Because BOOKKEEP would incorrectly be ignored by OpenGL importers. I've
> >> fixed that this causes memory corruption, but it is still nice to avoid.
> > Yes, what I'm saying is that on implicit sync CS submission should add
> > READ fences to the dma resv and on explicit sync CS submission should
> > add BOOKKEEP fences.
>
> No, exactly that is wrong.
>
> Implicit CS submissions should add WRITE fences.
>
> Explicit CS submissions should add READ fences.
>
> Only VM updates should add BOOKKEEP fences.
>
> >> BOOKKEEP can only be used by VM updates themselves. So that they don't
> >> interfere with CS.
> > That is the point why we would go BOOKKEEP for explicit sync CS
> > submissions, no? Explicit submission shouldn't interfere with any
> > other CS submissions. That includes being totally ignored by GL
> > importers (if we want to have synchronization there between an
> > explicit submission and GL, userspace is expected to use Jason's
> > dmabuf fence import/export IOCTLs)
>
> No, that would break existing DMA-buf rules.
>
> Explicit CS submissions are still a dependency for implicit submissions.

This is explicitly what we don't want for explicit submissions and why
I waited with this series until the DMA_RESV_USAGE series landed. We
wish to opt out from implicit sync completely, and just use the IOCTLs
Jason wrote for back-compat with windowing systems that need it.

If BOOKKEEP isn't for that, should we add a new USAGE?

>
> >
> > Then the second problem is that the VM IOCTL has absolutely no idea what
> > the CS IOCTL would be doing. That's why we have added the EXPLICIT sync
> > flag on the BO.
> > It doesn't need to? We just use a different sync_mode for BOOKKEEP
> > fences vs others:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F487887%2F%3Fseries%3D104578%26rev%3D2&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C81db0fea1c854076fc4408da47abafaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637901099957139870%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=F72Boaesx83MD2pjGuucA1buawi205XLSsQHg5EH39A%3D&amp;reserved=0
>
> No, exactly that's completely broken.
>
> Regards,
> Christian.
>
> >
> > (the nice thing about doing it this way is that it is independent of
> > the IOCTL, i.e. also works for the delayed mapping changes we trigger
> > on CS submit)
> >
> >> Regards,
> >> Christian.
> >>
> >>>> That should be doable, but then you don't need any of the other changes.
> >>>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>>> #1 Is rather easy to fix, you just need to copy all dma_fences from the
> >>>>>> page table dma_resv object over to the BOs dma_resv object in the gem
> >>>>>> close handler. E.g. exactly what you suggested with the dma_resv_copy
> >>>>>> function.
> >>>>>>
> >>>>>> #2 is a nightmare.
> >>>>>>
> >>>>>> We can't move the TLB flush at the end of the unmap operation because on
> >>>>>> async TLB flushes are either a bit complicated (double flushes etc..) or
> >>>>>> don't even work at all because of hw bugs. So to have a reliable TLB
> >>>>>> flush we must make sure that nothing else is ongoing and that means
> >>>>>> CS->VM->CS barrier.
> >>>>>>
> >>>>>> We try very hard to circumvent that already on maps by (for example)
> >>>>>> using a completely new VMID for CS after the VM map operation.
> >>>>>>
> >>>>>> But for the unmap operation we would need some kind special dma_fence
> >>>>>> implementation which would not only wait for all existing dma_fence but
> >>>>>> also for the one added until the unmap operation is completed. Cause
> >>>>>> otherwise our operation we do at #1 would simply not catch all
> >>>>>> dma_fences which have access to the memory.
> >>>>>>
> >>>>>> That's certainly doable, but I think just using the drm_exec stuff I
> >>>>>> already came up with is easier.
> >>>>>>
> >>>>>> When we can grab locks for all the BOs involved amdgpu_vm_clear_freed()
> >>>>>> goes away and we can keep track of the unmap operations in the bo_va
> >>>>>> structure.
> >>>>>>
> >>>>>> With that done you can make the explicit sync you noted in the bo_va
> >>>>>> structure and implicit sync when the bo_va structure goes away.
> >>>>>>
> >>>>>> Then the only reason I can see why we would need a CS->VM dependency is
> >>>>>> implicit synchronization, and that's what we are trying to avoid here in
> >>>>>> the first place.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Christian.
> >>>>>>
> >>>>>>>> To get rid of this barrier you must first fix the part where CS
> >>>>>>>> submissions wait for the VM operation to complete, e.g. the necessity of
> >>>>>>>> the barrier.
> >>>>>>>>
> >>>>>>>> I'm working on this for a couple of years now and I'm really running out
> >>>>>>>> of idea how to explain this restriction.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Christian.
> >>>>>>>>
>