On Wed, Jun 15, 2022 at 9:00 AM Christian König <christian.koenig@xxxxxxx> wrote: > > Am 06.06.22 um 13:00 schrieb Bas Nieuwenhuizen: > > On Mon, Jun 6, 2022 at 12:35 PM Christian König > > <christian.koenig@xxxxxxx> wrote: > >> [SNIP] > >> That part won't work at all and would cause additional synchronization > >> problems. > >> > >> First of all for implicit synced CS we should use READ, not BOOKKEEP. > >> Because BOOKKEEP would incorrectly be ignored by OpenGL importers. I've > >> fixed that this causes memory corruption, but it is still nice to avoid. > > Yes, what I'm saying is that on implicit sync CS submission should add > > READ fences to the dma resv and on explicit sync CS submission should > > add BOOKKEEP fences. > > No, exactly that is wrong. > > Implicit CS submissions should add WRITE fences. > > Explicit CS submissions should add READ fences. > > Only VM updates should add BOOKKEEP fences. > > >> BOOKKEEP can only be used by VM updates themselves. So that they don't > >> interfere with CS. > > That is the point why we would go BOOKKEEP for explicit sync CS > > submissions, no? Explicit submission shouldn't interfere with any > > other CS submissions. That includes being totally ignored by GL > > importers (if we want to have synchronization there between an > > explicit submission and GL, userspace is expected to use Jason's > > dmabuf fence import/export IOCTLs) > > No, that would break existing DMA-buf rules. > > Explicit CS submissions are still a dependency for implicit submissions. This is explicitly what we don't want for explicit submissions and why I waited with this series until the DMA_RESV_USAGE series landed. We wish to opt out from implicit sync completely, and just use the IOCTLs Jason wrote for back-compat with windowing systems that need it. If BOOKKEEP isn't for that, should we add a new USAGE? > > > > > Then the second problem is that the VM IOCTL has absolutely no idea what > > the CS IOCTL would be doing. That's why we have added the EXPLICIT sync > > flag on the BO. > > It doesn't need to? We just use a different sync_mode for BOOKKEEP > > fences vs others: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F487887%2F%3Fseries%3D104578%26rev%3D2&data=05%7C01%7Cchristian.koenig%40amd.com%7C81db0fea1c854076fc4408da47abafaa%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637901099957139870%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=F72Boaesx83MD2pjGuucA1buawi205XLSsQHg5EH39A%3D&reserved=0 > > No, exactly that's completely broken. > > Regards, > Christian. > > > > > (the nice thing about doing it this way is that it is independent of > > the IOCTL, i.e. also works for the delayed mapping changes we trigger > > on CS submit) > > > >> Regards, > >> Christian. > >> > >>>> That should be doable, but then you don't need any of the other changes. > >>>> > >>>> Regards, > >>>> Christian. > >>>> > >>>>>> #1 Is rather easy to fix, you just need to copy all dma_fences from the > >>>>>> page table dma_resv object over to the BOs dma_resv object in the gem > >>>>>> close handler. E.g. exactly what you suggested with the dma_resv_copy > >>>>>> function. > >>>>>> > >>>>>> #2 is a nightmare. > >>>>>> > >>>>>> We can't move the TLB flush at the end of the unmap operation because on > >>>>>> async TLB flushes are either a bit complicated (double flushes etc..) or > >>>>>> don't even work at all because of hw bugs. So to have a reliable TLB > >>>>>> flush we must make sure that nothing else is ongoing and that means > >>>>>> CS->VM->CS barrier. > >>>>>> > >>>>>> We try very hard to circumvent that already on maps by (for example) > >>>>>> using a completely new VMID for CS after the VM map operation. > >>>>>> > >>>>>> But for the unmap operation we would need some kind special dma_fence > >>>>>> implementation which would not only wait for all existing dma_fence but > >>>>>> also for the one added until the unmap operation is completed. Cause > >>>>>> otherwise our operation we do at #1 would simply not catch all > >>>>>> dma_fences which have access to the memory. > >>>>>> > >>>>>> That's certainly doable, but I think just using the drm_exec stuff I > >>>>>> already came up with is easier. > >>>>>> > >>>>>> When we can grab locks for all the BOs involved amdgpu_vm_clear_freed() > >>>>>> goes away and we can keep track of the unmap operations in the bo_va > >>>>>> structure. > >>>>>> > >>>>>> With that done you can make the explicit sync you noted in the bo_va > >>>>>> structure and implicit sync when the bo_va structure goes away. > >>>>>> > >>>>>> Then the only reason I can see why we would need a CS->VM dependency is > >>>>>> implicit synchronization, and that's what we are trying to avoid here in > >>>>>> the first place. > >>>>>> > >>>>>> Regards, > >>>>>> Christian. > >>>>>> > >>>>>>>> To get rid of this barrier you must first fix the part where CS > >>>>>>>> submissions wait for the VM operation to complete, e.g. the necessity of > >>>>>>>> the barrier. > >>>>>>>> > >>>>>>>> I'm working on this for a couple of years now and I'm really running out > >>>>>>>> of idea how to explain this restriction. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Christian. > >>>>>>>> >