Re: synchronize_rcu in munmap?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 09, 2021 at 01:38:22PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 09, 2021 at 06:19:35PM +0100, Laurent Dufour wrote:
> > Le 09/02/2021 à 15:29, Matthew Wilcox a écrit :
> > > On Mon, Feb 08, 2021 at 01:26:43PM +0000, Matthew Wilcox wrote:
> > > > Next problem: /proc/$pid/smaps calls walk_page_vma() which starts out by
> > > > saying:
> > > >          mmap_assert_locked(walk.mm);
> > > > which made me realise that smaps is also going to walk the page tables.
> > > > So the page tables have to be pinned by the existence of the VMA.
> > > > Which means the page tables must be freed by the same RCU callback that
> > > > frees the VMA.  But doing that means that a task which calls mmap();
> > > > munmap(); mmap(); must avoid allocating the same address for the second
> > > > mmap (until the RCU grace period has elapsed), otherwise threads on
> > > > other CPUs may see the stale PTEs instead of the new ones.
> > > > 
> > > > Solution 1: Move the page table freeing into the RCU callback, call
> > > > synchronize_rcu() in munmap().
> > > > 
> > > > Solution 2: Refcount the VMA and free the page tables on refcount
> > > > dropping to zero.  This doesn't actually work because the stale PTE
> > > > problem still exists.
> > > > 
> > > > Solution 3: When unmapping a VMA, instead of erasing the VMA from the
> > > > maple tree, put a "dead" entry in its place.  Once the RCU freeing and the
> > > > TLB shootdown has happened, erase the entry and it can then be allocated.
> > > > If we do that MAP_FIXED will have to synchronize_rcu() if it overlaps
> > > > a dead entry.
> > > 
> > > Solution 4: RCU free the page table pages and teach pagewalk.c to
> > > be RCU-safe.  That means that it will have to use rcu_dereference()
> > > or READ_ONCE to dereference (eg) pmdp, but also allows GUP-fast to run
> > > under the rcu read lock instead of disabling interrupts.
> > 
> > I might be wrong but my understanding is that the RCU window could not be
> > closed on a CPU where IRQs are disabled. So in a first step GUP-fast might
> > continue to disable interrupts to get safe walking the page directories.
> 
> Yes, this is right. PPC already uses RCU for the TLB flush and the
> GUP-fast trick is safe against that.
> 
> The comments for PPC say the downside of RCU is having to do an
> allocation in paths that really don't want to fail on memory
> exhaustion
> 
> The pagewalk.c needs to call its ops in a sleepable context, otherwise
> it could just use the normal page table locks.. Not sure RCU could be
> fit into here?

Depends on the caller of walk_page_*() whether the ops need to sleep
or not.  The specific problem we're trying to solve here is avoiding
taking the mmap_sem in /proc/$pid/smaps.  Now, we could just disable
interrupts instead of taking the mmap_sem, but I was hoping to do better.

So let's call that Solution 5:
 - smaps disables interrupts while calling pagewalk.
 - pagewalk accepts that it can be called locklessly (uses
   ptep_get_lockless() and so on)
 - smaps figures out how to handle races with khugepaged





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux