Re: Re: Splitting the mmap_sem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 06, 2020 at 12:15:36PM -0800, Matthew Wilcox wrote:
> On Thu, Feb 06, 2020 at 02:59:20PM +0100, Peter Zijlstra wrote:
> > > The proposal consists of three phases.  In phase 1, we convert the
> > > rbtree to the maple tree, and leave the locking alone.  In phase 2,
> > > we change the locking to a per-VMA refcount, looked up under RCU.
> > > 
> > > This problem arises during phase 3 where we attempt to handle page
> > > faults entirely under the RCU read lock.  If we encounter problems,
> > > we can fall back to acquiring the VMA refcount, but we need the
> > > page allocation to fail rather than sleep (or magically drop the
> > > RCU lock and return an indication that it has done so, but that
> > > doesn't seem to be an approach that would find any favour).
> > 
> > So why not use SRCU? You can do full blocking faults under SRCU and
> > don't need no 'stinkin' refcounts ;-)
> 
> I have to say, SRCU is not in my mental toolbox of "how to solve a
> problem", so it simply hadn't occurred to me.  Thanks.
> 
> So, we'd DEFINE_SRCU(vma_srcu); in mm/memory.c
> 
> then, at the beginning of a page fault call srcu_read_lock(&vma_srcu);
> walk the tree as we do now, allocate memory for PTEs, sleep waiting for
> pages to arrive back from disc, etc, etc, then at the end of the fault,
> call srcu_read_unlock(&vma_srcu). 

So far so good,...

> munmap() would consist of removing the
> VMA from the tree, then calling synchronize_srcu() to wait for all faults
> to finish, then putting the backing file, etc, etc and freeing the VMA.

call_srcu(), and the (s)rcu callback will then fput() and such things
more.

synchronize_srcu() (like synchronize_rcu()) is stupid slow and would
make munmap()/exit()/etc.. unusable.

> This seems pretty reasonable, and investigation could actually proceed
> before the Maple tree work lands.  Today, that would be:
> 
> srcu_read_lock(&vmas_srcu);
> down_read(&mm->mmap_sem);
> find_vma(mm, address);
> up_read(&mm->mmap_sem);
> ... rest of fault handler path ...
> srcu_read_unlock(&vmas_srcu);
> 
> Kind of a pain because we still call find_vma() in the per-arch page
> fault handler, but for prototyping, we'd only have to do one or two
> architectures.

If you look at the earlier speculative page-fault patches by Laurent,
which were based on my still earlier patches, you'll find most of this
there.

The tricky bit was validating everything on the second page-table walk,
so see if nothing had fundamentally changed, specifically the VMA,
before installing the PTE. If you do this without mmap_sem, you need to
hold ptlock to pin stuff while validating everything you did earlier.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux