On Fri, Jun 19, 2020 at 2:09 PM Jerome Glisse <jglisse@xxxxxxxxxx> wrote: > > On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote: > > On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote: > > > > > The madness is only that device B's mmu notifier might need to wait > > > for fence_B so that the dma operation finishes. Which in turn has to > > > wait for device A to finish first. > > > > So, it sound, fundamentally you've got this graph of operations across > > an unknown set of drivers and the kernel cannot insert itself in > > dma_fence hand offs to re-validate any of the buffers involved? > > Buffers which by definition cannot be touched by the hardware yet. > > > > That really is a pretty horrible place to end up.. > > > > Pinning really is right answer for this kind of work flow. I think > > converting pinning to notifers should not be done unless notifier > > invalidation is relatively bounded. > > > > I know people like notifiers because they give a bit nicer performance > > in some happy cases, but this cripples all the bad cases.. > > > > If pinning doesn't work for some reason maybe we should address that? > > Note that the dma fence is only true for user ptr buffer which predate > any HMM work and thus were using mmu notifier already. You need the > mmu notifier there because of fork and other corner cases. > > For nouveau the notifier do not need to wait for anything it can update > the GPU page table right away. Modulo needing to write to GPU memory > using dma engine if the GPU page table is in GPU memory that is not > accessible from the CPU but that's never the case for nouveau so far > (but i expect it will be at one point). > > > So i see this as 2 different cases, the user ptr case, which does pin > pages by the way, where things are synchronous. Versus the HMM cases > where everything is asynchronous. > > > I probably need to warn AMD folks again that using HMM means that you > must be able to update the GPU page table asynchronously without > fence wait. The issue for AMD is that they already update their GPU > page table using DMA engine. I believe this is still doable if they > use a kernel only DMA engine context, where only kernel can queue up > jobs so that you do not need to wait for unrelated things and you can > prioritize GPU page table update which should translate in fast GPU > page table update without DMA fence. All devices which support recoverable page faults also have a dedicated paging engine for the kernel driver which the driver already makes use of. We can also update the GPU page tables with the CPU. Alex > > > > > Full disclosure: We are aware that we've designed ourselves into an > > > impressive corner here, and there's lots of talks going on about > > > untangling the dma synchronization from the memory management > > > completely. But > > > > I think the documenting is really important: only GPU should be using > > this stuff and driving notifiers this way. Complete NO for any > > totally-not-a-GPU things in drivers/accel for sure. > > Yes for user that expect HMM they need to be asynchronous. But it is > hard to revert user ptr has it was done a long time ago. > > Cheers, > Jérôme > > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx