Re: [Linaro-mm-sig] [PATCH 04/18] dma-fence: prime lockdep annotations

Jason Gunthorpe <jgg@xxxxxxxx> · Fri, 19 Jun 2020 15:18:49 -0300

On Fri, Jun 19, 2020 at 02:09:35PM -0400, Jerome Glisse wrote:
> On Fri, Jun 19, 2020 at 02:23:08PM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 19, 2020 at 06:19:41PM +0200, Daniel Vetter wrote:
> > 
> > > The madness is only that device B's mmu notifier might need to wait
> > > for fence_B so that the dma operation finishes. Which in turn has to
> > > wait for device A to finish first.
> > 
> > So, it sound, fundamentally you've got this graph of operations across
> > an unknown set of drivers and the kernel cannot insert itself in
> > dma_fence hand offs to re-validate any of the buffers involved?
> > Buffers which by definition cannot be touched by the hardware yet.
> > 
> > That really is a pretty horrible place to end up..
> > 
> > Pinning really is right answer for this kind of work flow. I think
> > converting pinning to notifers should not be done unless notifier
> > invalidation is relatively bounded. 
> > 
> > I know people like notifiers because they give a bit nicer performance
> > in some happy cases, but this cripples all the bad cases..
> > 
> > If pinning doesn't work for some reason maybe we should address that?
> 
> Note that the dma fence is only true for user ptr buffer which predate
> any HMM work and thus were using mmu notifier already. You need the
> mmu notifier there because of fork and other corner cases.

I wonder if we should try to fix the fork case more directly - RDMA
has this same problem and added MADV_DONTFORK a long time ago as a
hacky way to deal with it.

Some crazy page pin that resolved COW in a way that always kept the
physical memory with the mm that initiated the pin?

(isn't this broken for O_DIRECT as well anyhow?)

How does mmu_notifiers help the fork case anyhow? Block fork from
progressing?

> I probably need to warn AMD folks again that using HMM means that you
> must be able to update the GPU page table asynchronously without
> fence wait.

It is kind of unrelated to HMM, it just shouldn't be using mmu
notifiers to replace page pinning..

> The issue for AMD is that they already update their GPU page table
> using DMA engine. I believe this is still doable if they use a
> kernel only DMA engine context, where only kernel can queue up jobs
> so that you do not need to wait for unrelated things and you can
> prioritize GPU page table update which should translate in fast GPU
> page table update without DMA fence.

Make sense

I'm not sure I saw this in the AMD hmm stuff - it would be good if
someone would look at that. Every time I do it looks like the locking
is wrong.

Jason