Re: [LSF/MM TOPIC] mmap locking topics

Suren Baghdasaryan <surenb@xxxxxxxxxx> · Wed, 9 Jun 2021 18:29:23 -0700

On Tue, Jun 1, 2021 at 8:34 PM Michel Lespinasse <michel@xxxxxxxxxxxxxx> wrote:
>
> On Tue, Jun 01, 2021 at 03:01:00PM +0100, Matthew Wilcox wrote:
> > On Mon, May 31, 2021 at 09:48:45PM -0700, Michel Lespinasse wrote:
> > > I - Speculative page faults
> > >
> > > The idea there is to avoid taking the mmap lock during page faults,
> > > at least for the easier cases. This requiers the fault handler to be
> > > a careful to avoid races with mmap writers (and most particularly
> > > munmap), and when the new page is ready to be inserted into the user
> > > process, to verify, at the last moment (after taking the page table
> > > lock), that there has been no race between the fault handler and any
> > > mmap writers.  Such checks can be implemented locally, without hitting
> > > any global locks, which results in very nice scalability improvements
> > > when processing concurrent faults.
> > >
> > > I think the idea is ready for prime time, and a patchset has been proposed,
> > > but it is not getting much traction yet. I suspect we will need to discuss
> > > the idea in person to figure out the next steps.
> >
> > There is a lot of interest in this.  I disagree with Michel's approach
> > in that he wants to use seqlocks to detect whether any modification has
> > been made to the process's address space, whereas I want to use the VMA
> > tree to detect whether any modification has been made to this VMA.
>
> I see the sequence count as being the easy & safe approach, but yes it
> does have limitations that can lead to unnecessary fast path aborts.
> It would be nice checking the VMAs to avoid *some* of these aborts,
> but I do not think that is always applicable either - I wrote about
> that in https://lwn.net/ml/linux-kernel/20210430224649.GA29203@xxxxxxxxxxxxxx/
> ("Thoughts about concurrency checks at the end of the page fault")

In Android we use the previous implementation of SPF posted by Laurent
Dufour (https://lore.kernel.org/patchwork/project/lkml/list/?series=390741&state=%2A&archive=both)
which unfortunately did not make it upstream. It improves start time
of some heavy multi-threaded applications and this is one of the most
important metrics for mobile devices. This patchset is also one of the
biggest out-of-tree patchsets we have to maintain in Android kernel
and it would be immensely valuable for us to have it upstreamed. I
would love to see this discussed at LSF/MM, will provide as much
supporting data as I can and will also push vendors and OEMs to
provide data as well.

>
> > > II - Fine grained MM locking
> > >
> > > A major limitation of the current mmap lock design is that it covers a
> > > process's entire address space. In threaded applications, it is common
> > > for threads to issue concurrent requests for non-overlapping parts of
> > > the process address space - for example, one thread might be mmaping
> > > new memory while another releases a different range, and a third might
> > > fault within his own address range too. The current mmap lock design
> > > does not take the non-overlapping ranges into consideration, and
> > > consequently serialises the 3 above requests rather than letting them
> > > proceed in parallel.
> > >
> > > There has been a lot of work spent mitigating the problem by reducing
> > > the mmap lock hold times (for example, dropping the mmap lock during
> > > page faults that hit disk, or lowering to a read lock during longer
> > > mmap/munmap/populate operations). But this approach is hitting its
> > > limits, and I think it would be better to fix the core of the problem
> > > by making the mmap lock capable of allowing concurrent non-overlapping
> > > operations.
> > >
> > > I would like to propose an approach that:
> > > - separates the mmap lock into two separate locks, one that is only
> > >   held for short periods of time to protect mm-wide data structures
> > >   (including the vma tree), and another that functions as a range lock
> > >   and can be held for longer periods of time;
> > > - allows for incremental conversion from the current code to being
> > >   aware about locking ranges;
> > >
> > > I have been maintaining a prototype for this, which has been shared
> > > with a small set of people. The main holdup is with page fault
> > > performance; in order to allow non-overlapping writers to proceed
> > > while some page faults are in progress, the prototype needs to
> > > maintain a shared structure holding addresses for each pending page
> > > fault. Updating this shared structure gets very expenside in high
> > > concurrency page fault benchmarks, though it seems quite unnoticeable
> > > in macro benchmarks I hae looked at.
> >
> > Here I have larger disagreements with Michel.  I do not believe the
> > range lock is a useful tool for this problem.
>
> Regardless of any proposed solution, do you agree that most of the cases
> where mmap lock blocking happens are between non-overlapping memory
> operations that could conceivably be handled concurrently ?
>
> In other words - do you believe that range locks would be too slow to
> be a useful solution, or is it that you do not think they would
> actually solve the issue ?
>
> > The two topics above seem large enough, but there are other important
> > users of the mmap_sem that also hit contention.  /proc/$pid/maps, smaps
> > and similar files hit priority inversion problems which have been reduced,
> > but not solved.
>
> Yes - I do think this would be worth discussing too. Not sure if that is
> a separate topic, or if this should be brought under a larger theme.
>
> Generally - I think there are many issues people have with mmap
> locking, and it's been really hard to make progress addressing these -
> even when prototype solutions exist - due to a lack of concensus (many
> of the people involved have different ideas as to which of the issues
> are important to them). But I think that's what makes this an important
> topic to be discussed ?
>
> --
> Michel "walken" Lespinasse
>