On Fri, Feb 28, 2020 at 11:13:40AM -0400, Jason Gunthorpe wrote: > On Fri, Feb 28, 2020 at 04:04:27PM +0100, Jean-Philippe Brucker wrote: > > On Fri, Feb 28, 2020 at 10:48:44AM -0400, Jason Gunthorpe wrote: > > > On Fri, Feb 28, 2020 at 03:39:35PM +0100, Jean-Philippe Brucker wrote: > > > > > > + list_for_each_entry_rcu(bond, &io_mm->devices, mm_head) { > > > > > > + /* > > > > > > + * To ensure that we observe the initialization of io_mm fields > > > > > > + * by io_mm_finalize() before the registration of this bond to > > > > > > + * the list by io_mm_attach(), introduce an address dependency > > > > > > + * between bond and io_mm. It pairs with the smp_store_release() > > > > > > + * from list_add_rcu(). > > > > > > + */ > > > > > > + io_mm = rcu_dereference(bond->io_mm); > > > > > > > > > > A rcu_dereference isn't need here, just a normal derference is fine. > > > > > > > > bond->io_mm is annotated with __rcu (for iommu_sva_get_pasid_generic(), > > > > which does bond->io_mm under rcu_read_lock()) > > > > > > I'm surprised the bond->io_mm can change over the lifetime of the > > > bond memory.. > > > > The normal lifetime of the bond is between device driver calls to bind() > > and unbind(). If the mm exits early, though, we clear bond->io_mm. The > > bond is then stale but can only be freed when the device driver releases > > it with unbind(). > > I usually advocate for simple use of these APIs. The mm_notifier_get() > should happen in bind() and the matching put should happen in the > call_rcu callbcak that does the kfree. I tried to keep it simple like that: normally mmu_notifier_get() is called in bind(), and mmu_notifier_put() is called in unbind(). Multiple device drivers may call bind() with the same mm. Each bind() calls mmu_notifier_get(), obtains the same io_mm, and returns a new bond (a device<->mm link). Each bond is freed by calling unbind(), which calls mmu_notifier_put(). That's the most common case. Now if the process is killed and the mm disappears, we do need to avoid use-after-free caused by DMA of the mappings and the page tables. So the release() callback, before doing invalidate_all, stops DMA and clears the page table pointer on the IOMMU side. It detaches all bonds from the io_mm, calling mmu_notifier_put() for each of them. After release(), bond objects still exists and device drivers still need to free them with unbind(), but they don't point to an io_mm anymore. > Then you can never get a stale > pointer. Don't worry about exit_mmap(). > > release() is an unusual callback and I see alot of places using it > wrong. The purpose of release is to invalidate_all, that is it. > > Also, confusingly release may be called multiple times in some > situations, so it shouldn't disturb anything that might impact a 2nd > call. I hadn't realized that. The current implementation should be safe against it, as release() is a nop if the io_mm doesn't have bonds anymore. Do you have an example of such a situation? I'm trying to write tests for this kind of corner cases. Thanks, Jean