[LSF/MM/BPF TOPIC] Replacing mmap_sem with finer grained locks

Michel Lespinasse <walken@xxxxxxxxxx> · Fri, 14 Feb 2020 05:03:38 -0800

Hi,

I would like to propose this topic for LSF/MM 2020. This is a
continuation of discussions that were started at LSF/MM 2019 and have
informally continued since between the copied folks and I.

The fact that mmap_sem locks the entire MM is causing a lot of
problems. The fundamental design hasn't changed in 20+ years, though a
number of hacks have been added (such as releasing the mmap_sem during
page faults) to work around the worst issues with it. In modern
threaded workloads, we often see multiple threads running
non-overlapping memory operations, which end up unnecessarily blocking
on each other because mmap_sem only supports locking the entire MM
rather than just the address range each thread is operating on.

I have been working on a patch set to replace the mmap_sem rwsem with
a range lock, which should resolve this issue. This is currently
implemented through the page fault path and some very narrow cases of
mmap(); I am working to broaden the scope of the mmap changes before
sending this patch set publicly; I also know Davidlohr and Vlastimil
have been working on similar approaches in the past.

Another approach that is being explored is speculative page faults; I
know Peter and Laurent have been working on this in the past and
Matthew is giving this another look at the moment. I think this is a
different angle to approach the problem from; I think this solution is
not as generic (my understanding is that it only works for the page
fault path), but more efficient for the cases that it handles.

I really would like to get a new discussion about this, to discuss the
concrete proposals that people have been working on and set a
direction moving forward.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.