Re: [PATCH v4 01/26] mm/mmu_notifiers: pass private data down to alloc_notifier()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 06, 2020 at 09:09:19AM -0400, Jason Gunthorpe wrote:
> On Fri, Mar 06, 2020 at 10:56:14AM +0100, Jean-Philippe Brucker wrote:
> > I tried to keep it simple like that: normally mmu_notifier_get() is called
> > in bind(), and mmu_notifier_put() is called in unbind(). 
> > 
> > Multiple device drivers may call bind() with the same mm. Each bind()
> > calls mmu_notifier_get(), obtains the same io_mm, and returns a new bond
> > (a device<->mm link). Each bond is freed by calling unbind(), which calls
> > mmu_notifier_put().
> > 
> > That's the most common case. Now if the process is killed and the mm
> > disappears, we do need to avoid use-after-free caused by DMA of the
> > mappings and the page tables. 
> 
> This is why release must do invalidate all - but it doesn't need to do
> any more - as no SPTE can be established without a mmget() - and
> mmget() is no longer possible past release.

In our case we don't have SPTEs, the whole pgd is shared between MMU and
IOMMU (isolated using PASID tables).

Taking the concrete example of the crypto accelerator:

1. A process opens a queue in the accelerator. That queue is bound to the
   address space: a PASID is allocated for the mm, and mm->pgd is written
   into the IOMMU PASID table.
2. The process queues some work and waits. In the background, the
   accelerators performs DMA on the process address space, by using the
   mm's PASID.
3. Now the process gets killed, and release() is called.

At this point no one told the device to stop working on this queue, it may
still be doing DMA on this address space. So the first thing we do is
notify the device driver that the bond is going away, and that it must
stop the queue and flush remaining DMA transactions for this PASID.

Then we also clear the pgd from the IOMMU PASID table. If we only did
invalidate-all and somehow the queue wasn't properly stopped, concurrent
DMA would immediately form new IOTLB entries since the page tables haven't
been wiped at this point. And later, it would use-after-free page tables
and mappings. Whereas with a clear pgd it would just generate IOMMU fault
events, which are undesirable but harmless.

Thanks,
Jean

> > So the release() callback, before doing invalidate_all, stops DMA
> > and clears the page table pointer on the IOMMU side. It detaches all
> > bonds from the io_mm, calling mmu_notifier_put() for each of
> > them. After release(), bond objects still exists and device drivers
> > still need to free them with unbind(), but they don't point to an
> > io_mm anymore.
> 
> Why is so much work needed in release? It really should just be
> invalidate all, usually trying to sort out all the locking for the
> more complicated stuff is not worthwhile.
> 
> If other stuff is implicitly relying on the mm being alive and release
> to fence against that then it is already racy. If it doesn't, then why
> bother doing complicated work in release?
> 
> > > Then you can never get a stale
> > > pointer. Don't worry about exit_mmap().
> > > 
> > > release() is an unusual callback and I see alot of places using it
> > > wrong. The purpose of release is to invalidate_all, that is it.
> > > 
> > > Also, confusingly release may be called multiple times in some
> > > situations, so it shouldn't disturb anything that might impact a 2nd
> > > call.
> > 
> > I hadn't realized that. The current implementation should be safe against
> > it, as release() is a nop if the io_mm doesn't have bonds anymore. Do you
> > have an example of such a situation?  I'm trying to write tests for this
> > kind of corner cases.
> 
> Hmm, let me think. Ah, you have to be using mmu_notifier_unregister()
> to get that race. This is one of the things that get/put don't suffer
> from - but they conversely don't guarantee that release() will be
> called, so it is up to the caller to ensure everything is fenced
> before calling put.
> 
> Jason



[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux