Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Tue, 18 Feb 2025 16:43:46 +0000

On Tue, Feb 18, 2025 at 04:20:18PM +0100, David Hildenbrand wrote:
> > Right yeah that'd be super weird. And I don't want to add that logic.
> >
> > > Also not sure what happens if one does an mlock()/mlockall() after
> > > already installing PTE markers.
> >
> > The existing logic already handles non-present cases by skipping them, in
> > mlock_pte_range():
> >
> > 	for (pte = start_pte; addr != end; pte++, addr += PAGE_SIZE) {
> > 		ptent = ptep_get(pte);
> > 		if (!pte_present(ptent))
> > 			continue;
> >
> > 		...
> > 	}
>
> I *think* that code only updates already-mapped folios, to properly call
> mlock_folio()/munlock_folio().

Guard regions _are_ 'already mapped' :) so it leaves them in place.

do_mlock() -> apply_vma_lock_flags() -> mlock_fixup() -> mlock_vma_pages_range()
implies this will be invoked.

>
> It is not the code that populates pages on mlock()/mlockall(). I think all
> that goes via mm_populate()/__mm_populate(), where "ordinary GUP" should
> apply.

OK I want to correct what I said earlier.

Installing a guard region then attempting mlock() will result in an error. The
populate will -EFAULT and stop at the guard region, which causes mlock() to
error out.

This is a partial failure, so the VMA is split and has VM_LOCKED applied, but
the populate halts at the guard region.

This is ok as per previous discussion on aggregate operation failure, there can
be no expectation of 'unwinding' of partially successful operations that form
part of a requested aggregate one.

However, given there's stuff to clean up, and on error a user _may_ wish to then
remove guard regions and try again, I guess there's no harm in keeping the code
as it is where we allow MADV_GUARD_REMOVE even if VM_LOCKED is in place.

>
> See populate_vma_page_range(), especially also the VM_LOCKONFAULT handling.

Yeah that code is horrible, you just reminded me of it... 'rightly or wrongly'
yeah wrongly, very wrongly...

>
> >
> > Which covers off guard regions. Removing the guard regions after this will
> > leave you in a weird situation where these entries will be zapped... maybe
> > we need a patch to make MADV_GUARD_REMOVE check VM_LOCKED and in this case
> > also populate?
>
> Maybe? Or we say that it behaves like MADV_DONTNEED_LOCKED.

See above, no we should not :P this is only good for cleanup after mlock()
failure, although no sane program should really be trying to do this, a sane
program would give up here (and it's a _programmatic error_ to try to mlock() a
range with guard regions).

>
> >
> > Actually I think the simpler option is to just disallow MADV_GUARD_REMOVE
> > if you since locked the range? The code currently allows this on the
> > proviso that 'you aren't zapping locked mappings' but leaves the VMA in a
> > state such that some entries would not be locked.
> >
> > It'd be pretty weird to lock guard regions like this.
> >
> > Having said all that, given what you say below, maybe it's not an issue
> > after all?...
> >
> > >
> > > __mm_populate() would skip whole VMAs in case populate_vma_page_range()
> > > fails. And I would assume populate_vma_page_range() fails on the first
> > > guard when it triggers a page fault.
> > >
> > > OTOH, supporting the mlock-on-fault thingy should be easy. That's precisely where
> > > MADV_DONTNEED_LOCKED originates from:
> > >
> > > commit 9457056ac426e5ed0671356509c8dcce69f8dee0
> > > Author: Johannes Weiner <hannes@xxxxxxxxxxx>
> > > Date:   Thu Mar 24 18:14:12 2022 -0700
> > >
> > >      mm: madvise: MADV_DONTNEED_LOCKED
> > >      MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT
> > >      and MCL_ONFAULT allowing to mlock without populating, there are valid use
> > >      cases for depopulating locked ranges as well.
> >
> > ...Hm this seems to imply the current guard remove stuff isn't quite so
> > bad, so maybe the assumption that VM_LOCKED implies 'everything is
> > populated' isn't quite as stringent then.
>
> Right, with MCL_ONFAULT at least. Without MCL_ONFAULT, the assumption is
> that everything is populated (unless, apparently one uses
> MADV_DONTNEED_LOCKED or population failed, maybe).
>
> VM_LOCKONFAULT seems to be the sane case. I wonder why MADV_DONTNEED_LOCKED
> didn't explicitly check for that one ... maybe there is a history to that.

Yeah weird.

>
> >
> > The restriction is as simple as:
> >
> > 		if (behavior != MADV_DONTNEED_LOCKED)
> > 			forbidden |= VM_LOCKED;
> >
> > >
> > >
> > > Adding support for that would be indeed nice.
> >
> > I mean it's sort of maybe understandable why you'd want to MADV_DONTNEED
> > locked ranges, but I really don't understand why you'd want to add guard
> > regions to mlock()'ed regions?
>
> Somme apps use mlockall(), and it might be nice to just be able to use guard
> pages as if "Nothing happened".

Sadly I think not given above :P

>
> E.g., QEMU has the option to use mlockall().
>
> >
> > Then again we're currently asymmetric as you can add them _before_
> > mlock()'ing...
>
> Right.
>
> --
> Cheers,
>
> David / dhildenb
>

I think the _LOCKED idea is therefore kaput, because it just won't work
properly because populating guard regions fails.

It fails because it tries to 'touch' the memory, but 'touching' guard
region memory causes a segfault. This kind of breaks the idea of
mlock()'ing guard regions.

I think adding workarounds to make this possible in any way is not really
worth it (and would probably be pretty gross).

We already document that 'mlock()ing lightweight guard regions will fail'
as per man page so this is all in line with that.