Re: [PATCH v7 4/4] userfaultfd: use per-vma locks in userfaultfd operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 23, 2025 at 04:48:10PM +0000, Lorenzo Stoakes wrote:
> +cc vbabka

Sorry my mail client messing up, forgot to actually cc... fixed :)

>
> I realise this is resurrecting an old thread, but helpful to cc- us mmap
> maintainers as my mail at least is nightmare :P Thanks!
>
> On Thu, Jan 23, 2025 at 05:14:27PM +1300, Barry Song wrote:
> > > All userfaultfd operations, except write-protect, opportunistically use
> > > per-vma locks to lock vmas. On failure, attempt again inside mmap_lock
> > > critical section.
> > >
> > > Write-protect operation requires mmap_lock as it iterates over multiple
> > > vmas.
> >
> > Hi Lokesh,
> >
> > Apologies for reviving this old thread. We truly appreciate the excellent work
> > you’ve done in transitioning many userfaultfd operations to per-VMA locks.
>
> Let me also say - thanks Lokesh!
>
> >
> > However, we’ve noticed that userfaultfd still remains one of the largest users
> > of mmap_lock for write operations, with the other—binder—having been recently
> > addressed by Carlos Llamas's "binder: faster page installations" series:
> >
> > https://lore.kernel.org/lkml/20241203215452.2820071-1-cmllamas@xxxxxxxxxx/
> >
> > The HeapTaskDaemon(Java GC) might frequently perform userfaultfd_register()
> > and userfaultfd_unregister() operations, both of which require the mmap_lock
> > in write mode to either split or merge VMAs. Since HeapTaskDaemon is a
> > lower-priority background task, there are cases where, after acquiring the
> > mmap_lock, it gets preempted by other tasks. As a result, even high-priority
> > threads waiting for the mmap_lock — whether in writer or reader mode—can
> > end up experiencing significant delays(The delay can reach several hundred
> > milliseconds in the worst case.)
>
> Thanks for reporting - this strikes me as an important report, but I'm not
> sure about your proposed solution :)
>
> However I wonder if we can't look at more efficiently handling the locking
> around uffd register/unregister?
>
> I think uffd suffers from a complexity problem, in that there are multiple
> moving parts and complicated interactions especially for each of the
> different kinds of events uffd can deal with.
>
> Also ordering matters a great deal within uffd. Ideally we'd not have uffd
> effectively have open-coded hooks all over the mm code, but that ship
> sailed long ago and so we need to really carefully assess how locking
> changes impacts uffd behaviour.
>
> >
> > We haven’t yet identified an ideal solution for this. However, the Java heap
> > appears to behave like a "volatile" vma in its usage. A somewhat simplistic
> > idea would be to designate a specific region of the user address space as
> > "volatile" and restrict all "volatile" VMAs to this isolated region.
> >
> > We may have a MAP_VOLATILE flag to mmap. VMA regions with this flag will be
> > mapped to the volatile space, while those without it will be mapped to the
> > non-volatile space.
>
> This feels like a major thing to do just to suit specific uffd usages, a
> feature that not everybody makes use of.
>
> The virtual address space is a somewhat sensitive subject (see the
> relatively recent discussion on a proposed MAP_BELOW flag), and has a lot
> of landmines you can step on.
>
> How would this range be determined? How would this interact with people
> doing 'interesting' things with MAP_FIXED mappings? What if somebody
> MAP_FIXED into a volatile region and gets this behaviour by mistake?
>
> How do the two locks interact with one another and vma locks? I mean we
> already have a very complicated set of interactions between the mmap sem
> and vma locks where the mmap sem is taken to lock the mmap (of course), but
> now we'd have two?
>
> Also you have to write lock when you modify the VMA tree, would we have two
> maple trees now? And what about the overhead?
>
> I feel like this is not the right route for this.
>
> >
> >          ┌────────────┐TASK_SIZE
> >          │            │
> >          │            │
> >          │            │mmap VOLATILE
> >          ┼────────────┤
> >          │            │
> >          │            │
> >          │            │
> >          │            │
> >          │            │default mmap
> >          │            │
> >          │            │
> >          └────────────┘
> >
> > VMAs in the volatile region are assigned their own volatile_mmap_lock,
> > which is independent of the mmap_lock for the non-volatile region.
> > Additionally, we ensure that no single VMA spans the boundary between
> > the volatile and non-volatile regions. This separation prevents the
> > frequent modifications of a small number of volatile VMAs from blocking
> > other operations on a large number of non-volatile VMAs.
>
> I think really overall this will be solving one can of worms by introducing
> another can of very large worms in space :P but perhaps I am missing
> details here.
>
> >
> > The implementation itself wouldn’t be overly complex, but the design
> > might come across as somewhat hacky.
> >
> > Lastly, I have two questions:
> >
> > 1. Have you observed similar issues where userfaultfd continues to
> > cause lock contention and priority inversion?
> >
> > 2. If so, do you have any ideas or suggestions on how to address this
> > problem?
>
> Addressing and investigating this however is VERY IMPORTANT. Let's perhaps
> investigate how we might find a uffd-specific means of addressing these
> problems.
>
> >
> > >
> > > Signed-off-by: Lokesh Gidra <lokeshgidra@xxxxxxxxxx>
> > > Reviewed-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
> > > ---
> > >  fs/userfaultfd.c              |  13 +-
> > >  include/linux/userfaultfd_k.h |   5 +-
> > >  mm/huge_memory.c              |   5 +-
> > >  mm/userfaultfd.c              | 380 ++++++++++++++++++++++++++--------
> > >  4 files changed, 299 insertions(+), 104 deletions(-)
> > >
> > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > > index c00a021bcce4..60dcfafdc11a 100644
> > > --- a/fs/userfaultfd.c
> > > +++ b/fs/userfaultfd.c
> >
> > Thanks
> > Barry
> >
>
> Cheers, Lorenzo




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux