Re: Re: Splitting the mmap_sem

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 6 Feb 2020 12:15:36 -0800

On Thu, Feb 06, 2020 at 02:59:20PM +0100, Peter Zijlstra wrote:
> > The proposal consists of three phases.  In phase 1, we convert the
> > rbtree to the maple tree, and leave the locking alone.  In phase 2,
> > we change the locking to a per-VMA refcount, looked up under RCU.
> > 
> > This problem arises during phase 3 where we attempt to handle page
> > faults entirely under the RCU read lock.  If we encounter problems,
> > we can fall back to acquiring the VMA refcount, but we need the
> > page allocation to fail rather than sleep (or magically drop the
> > RCU lock and return an indication that it has done so, but that
> > doesn't seem to be an approach that would find any favour).
> 
> So why not use SRCU? You can do full blocking faults under SRCU and
> don't need no 'stinkin' refcounts ;-)

I have to say, SRCU is not in my mental toolbox of "how to solve a
problem", so it simply hadn't occurred to me.  Thanks.

So, we'd DEFINE_SRCU(vma_srcu); in mm/memory.c

then, at the beginning of a page fault call srcu_read_lock(&vma_srcu);
walk the tree as we do now, allocate memory for PTEs, sleep waiting for
pages to arrive back from disc, etc, etc, then at the end of the fault,
call srcu_read_unlock(&vma_srcu).  munmap() would consist of removing the
VMA from the tree, then calling synchronize_srcu() to wait for all faults
to finish, then putting the backing file, etc, etc and freeing the VMA.

This seems pretty reasonable, and investigation could actually proceed
before the Maple tree work lands.  Today, that would be:

srcu_read_lock(&vmas_srcu);
down_read(&mm->mmap_sem);
find_vma(mm, address);
up_read(&mm->mmap_sem);
... rest of fault handler path ...
srcu_read_unlock(&vmas_srcu);

Kind of a pain because we still call find_vma() in the per-arch page
fault handler, but for prototyping, we'd only have to do one or two
architectures.