> All userfaultfd operations, except write-protect, opportunistically use > per-vma locks to lock vmas. On failure, attempt again inside mmap_lock > critical section. > > Write-protect operation requires mmap_lock as it iterates over multiple > vmas. Hi Lokesh, Apologies for reviving this old thread. We truly appreciate the excellent work you’ve done in transitioning many userfaultfd operations to per-VMA locks. However, we’ve noticed that userfaultfd still remains one of the largest users of mmap_lock for write operations, with the other—binder—having been recently addressed by Carlos Llamas's "binder: faster page installations" series: https://lore.kernel.org/lkml/20241203215452.2820071-1-cmllamas@xxxxxxxxxx/ The HeapTaskDaemon(Java GC) might frequently perform userfaultfd_register() and userfaultfd_unregister() operations, both of which require the mmap_lock in write mode to either split or merge VMAs. Since HeapTaskDaemon is a lower-priority background task, there are cases where, after acquiring the mmap_lock, it gets preempted by other tasks. As a result, even high-priority threads waiting for the mmap_lock — whether in writer or reader mode—can end up experiencing significant delays(The delay can reach several hundred milliseconds in the worst case.) We haven’t yet identified an ideal solution for this. However, the Java heap appears to behave like a "volatile" vma in its usage. A somewhat simplistic idea would be to designate a specific region of the user address space as "volatile" and restrict all "volatile" VMAs to this isolated region. We may have a MAP_VOLATILE flag to mmap. VMA regions with this flag will be mapped to the volatile space, while those without it will be mapped to the non-volatile space. ┌────────────┐TASK_SIZE │ │ │ │ │ │mmap VOLATILE ┼────────────┤ │ │ │ │ │ │ │ │ │ │default mmap │ │ │ │ └────────────┘ VMAs in the volatile region are assigned their own volatile_mmap_lock, which is independent of the mmap_lock for the non-volatile region. Additionally, we ensure that no single VMA spans the boundary between the volatile and non-volatile regions. This separation prevents the frequent modifications of a small number of volatile VMAs from blocking other operations on a large number of non-volatile VMAs. The implementation itself wouldn’t be overly complex, but the design might come across as somewhat hacky. Lastly, I have two questions: 1. Have you observed similar issues where userfaultfd continues to cause lock contention and priority inversion? 2. If so, do you have any ideas or suggestions on how to address this problem? > > Signed-off-by: Lokesh Gidra <lokeshgidra@xxxxxxxxxx> > Reviewed-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > --- > fs/userfaultfd.c | 13 +- > include/linux/userfaultfd_k.h | 5 +- > mm/huge_memory.c | 5 +- > mm/userfaultfd.c | 380 ++++++++++++++++++++++++++-------- > 4 files changed, 299 insertions(+), 104 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index c00a021bcce4..60dcfafdc11a 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c Thanks Barry