[merged mm-stable] mm-allow-guard-regions-in-file-backed-and-read-only-mappings.patch removed from -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The quilt patch titled
     Subject: mm: allow guard regions in file-backed and read-only mappings
has been removed from the -mm tree.  Its filename was
     mm-allow-guard-regions-in-file-backed-and-read-only-mappings.patch

This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
Subject: mm: allow guard regions in file-backed and read-only mappings
Date: Thu, 13 Feb 2025 18:17:00 +0000

Patch series "mm: permit guard regions for file-backed/shmem mappings".

The guard regions feature was initially implemented to support anonymous
mappings only, excluding shmem.

This was done so as to introduce the feature carefully and incrementally
and to be conservative when considering the various caveats and corner
cases that are applicable to file-backed mappings but not to anonymous
ones.

Now this feature has landed in 6.13, it is time to revisit this and to
extend this functionality to file-backed and shmem mappings.

In order to make this maximally useful, and since one may map file-backed
mappings read-only (for instance ELF images), we also remove the
restriction on read-only mappings and permit the establishment of guard
regions in any non-hugetlb, non-mlock()'d mapping.

It is permissible to permit the establishment of guard regions in
read-only mappings because the guard regions only reduce access to the
mapping, and when removed simply reinstate the existing attributes of the
underlying VMA, meaning no access violations can occur.

While the change in kernel code introduced in this series is small, the
majority of the effort here is spent in extending the testing to assert
that the feature works correctly across numerous file-backed mapping
scenarios.

Every single guard region self-test performed against anonymous memory
(which is relevant and not anon-only) has now been updated to also be
performed against shmem and a mapping of a file in the working directory.

This confirms that all cases also function correctly for file-backed guard
regions.

In addition a number of other tests are added for specific file-backed
mapping scenarios.

There are a number of other concerns that one might have with regard to
guard regions, addressed below:

Readahead
~~~~~~~~~

Readahead is a process through which the page cache is populated on the
assumption that sequential reads will occur, thus amortising I/O and,
through a clever use of the PG_readahead folio flag establishing during
major fault and checked upon minor fault, provides for asynchronous I/O to
occur as dat is processed, reducing I/O stalls as data is faulted in.

Guard regions do not alter this mechanism which operates at the folio and
fault level, but does of course prevent the faulting of folios that would
otherwise be mapped.

In the instance of a major fault prior to a guard region, synchronous
readahead will occur including populating folios in the page cache which
the guard regions will, in the case of the mapping in question, prevent
access to.

In addition, if PG_readahead is placed in a folio that is now
inaccessible, this will prevent asynchronous readahead from occurring as
it would otherwise do.

However, there are mechanisms for heuristically resetting this within
readahead regardless, which will 'recover' correct readahead behaviour.

Readahead presumes sequential data access, the presence of a guard region
clearly indicates that, at least in the guard region, no such sequential
access will occur, as it cannot occur there.

So this should have very little impact on any real workload.  The far more
important point is as to whether readahead causes incorrect or
inappropriate mapping of ranges disallowed by the presence of guard
regions - this is not the case, as readahead does not 'pre-fault' memory
in this fashion.

At any rate, any mechanism which would attempt to do so would hit the
usual page fault paths, which correctly handle PTE markers as with
anonymous mappings.

Fault-Around
~~~~~~~~~~~~

The fault-around logic, in a similar vein to readahead, attempts to
improve efficiency with regard to file-backed memory mappings, however it
differs in that it does not try to fetch folios into the page cache that
are about to be accessed, but rather pre-maps a range of folios around the
faulting address.

Guard regions making use of PTE markers makes this relatively trivial, as
this case is already handled - see filemap_map_folio_range() and
filemap_map_order0_folio() - in both instances, the solution is to simply
keep the established page table mappings and let the fault handler take
care of PTE markers, as per the comment:

	/*
	 * NOTE: If there're PTE markers, we'll leave them to be
	 * handled in the specific fault path, and it'll prohibit
	 * the fault-around logic.
	 */

This works, as establishing guard regions results in page table mappings
with PTE markers, and clearing them removes them.

Truncation
~~~~~~~~~~

File truncation will not eliminate existing guard regions, as the
truncation operation will ultimately zap the range via
unmap_mapping_range(), which specifically excludes PTE markers.

Zapping
~~~~~~~

Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(),
which specifically deals with guard entries, leaving them intact except in
instances such as process teardown or munmap() where they need to be
removed.

Reclaim
~~~~~~~

When reclaim is performed on file-backed folios, it ultimately invokes
try_to_unmap_one() via the rmap.  If the folio is non-large, then
map_pte() will ultimately abort the operation for the guard region
mapping.  If large, then check_pte() will determine that this is a
non-device private entry/device-exclusive entry 'swap' PTE and thus abort
the operation in that instance.

Therefore, no odd things happen in the instance of reclaim being attempted
upon a file-backed guard region.

Hole Punching
~~~~~~~~~~~~~

This updates the page cache and ultimately invokes unmap_mapping_range(),
which explicitly leaves PTE markers in place.

Because the establishment of guard regions zapped any existing mappings to
file-backed folios, once the guard regions are removed then the
hole-punched region will be faulted in as usual and everything will behave
as expected.


One thing to note with this series is that it now implies file-backed
VMAs which install guard regions will now have an anon_vma installed if
not already present (i.e.  if not post-CoW MAP_PRIVATE).

I have audited kernel source for instances of vma->anon_vma checks and
found nowhere where this would be problematic for pure file-backed
mappings.

I also discussed (off-list) with Matthew who confirmed he can't see any
issue with this.

In effect, we treat these VMAs as if they are MAP_PRIVATE, only with 0
CoW'd pages.  As a result, the rmap never has a reason to reference the
anon_vma from folios at any point and thus no unexpected or weird
behaviour results.

The anon_vma logic tries to avoid unnecessary anon_vma propagation on
fork so we ought to at least minimise overhead.

However, this is still overhead, and unwelcome overhead.

The whole reason we do this (in madvise_guard_install()) is to ensure
that fork _copies page tables_.  Otherwise, in vma_needs_copy(),
nothing will indicate that we should do so.

This was already an unpleasant thing to have to do, but without a new
VMA flag we really have no reasonable means of ensuring this happens.

Going forward, I intend to add a new VMA flag, VM_MAYBE_GUARDED or
something like this.

This would have specific behaviour - for the purposes of merging, it
would be ignored.  However on both split and merge, it will be
propagated.  It is therefore 'sticky'.

This is to avoid having to traverse page tables to determine which
parts of a VMA contain guard regions and of course to maintain the
desirable qualities of guard regions - the lack of VMA propagation (+
thus slab allocations of VMAs).

Adding this flag and adjusting vma_needs_copy() to reference it would
resolve the issue.

However :) we have a VMA flag space issue, so it'd render this a 64-bit
feature only.

Having discussed with Matthew a plan by which to perhaps extend
available flags for 32-bit we may going forward be able to avoid this. 
But this may be a longer term project.

In the meantime, we'd have to resort to the anon_vma hack for 32-bit,
using the flag for 64-bit.  The issue with this however is if we do
then intend to allow the flag to enable /proc/$pid/maps visibility
(something this could allow), it would also end up being 64-bit only
which would be a pity.

Regardless - I wanted to highlight this behaviour as it is perhaps
somewhat surprising.


This patch (of 4):

There is no reason to disallow guard regions in file-backed mappings -
readahead and fault-around both function correctly in the presence of PTE
markers, equally other operations relating to memory-mapped files function
correctly.

Additionally, read-only mappings if introducing guard-regions, only
restrict the mapping further, which means there is no violation of any
access rights by permitting this to be so.

Removing this restriction allows for read-only mapped files (such as
executable files) correctly which would otherwise not be permitted.

Link: https://lkml.kernel.org/r/cover.1739469950.git.lorenzo.stoakes@xxxxxxxxxx
Link: https://lkml.kernel.org/r/d885cb259174736c2830a5dfe07f81b214ef3faa.1739469950.git.lorenzo.stoakes@xxxxxxxxxx
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx>
Acked-by: David Hildenbrand <david@xxxxxxxxxx>
Cc: Jann Horn <jannh@xxxxxxxxxx>
Cc: John Hubbard <jhubbard@xxxxxxxxxx>
Cc: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
Cc: Liam Howlett <liam.howlett@xxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: "Paul E . McKenney" <paulmck@xxxxxxxxxx>
Cc: Shuah Khan <shuah@xxxxxxxxxx>
Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/madvise.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/mm/madvise.c~mm-allow-guard-regions-in-file-backed-and-read-only-mappings
+++ a/mm/madvise.c
@@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm
 	if (!allow_locked)
 		disallowed |= VM_LOCKED;
 
-	if (!vma_is_anonymous(vma))
-		return false;
-
-	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
-		return false;
-
-	return true;
+	return !(vma->vm_flags & disallowed);
 }
 
 static bool is_guard_pte_marker(pte_t ptent)
_

Patches currently in -mm which might be from lorenzo.stoakes@xxxxxxxxxx are

mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch
mm-mremap-refactor-mremap-system-call-implementation.patch
mm-mremap-introduce-and-use-vma_remap_struct-threaded-state.patch
mm-mremap-initial-refactor-of-move_vma.patch
mm-mremap-complete-refactor-of-move_vma.patch
mm-mremap-refactor-move_page_tables-abstracting-state.patch
mm-mremap-thread-state-through-move-page-table-operation.patch





[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux