On Wed, Feb 19, 2025 at 12:25 AM Kalesh Singh <kaleshsingh@xxxxxxxxxx> wrote: > > On Thu, Feb 13, 2025 at 10:18 AM Lorenzo Stoakes > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > The guard regions feature was initially implemented to support anonymous > > mappings only, excluding shmem. > > > > This was done such as to introduce the feature carefully and incrementally > > and to be conservative when considering the various caveats and corner > > cases that are applicable to file-backed mappings but not to anonymous > > ones. > > > > Now this feature has landed in 6.13, it is time to revisit this and to > > extend this functionality to file-backed and shmem mappings. > > > > In order to make this maximally useful, and since one may map file-backed > > mappings read-only (for instance ELF images), we also remove the > > restriction on read-only mappings and permit the establishment of guard > > regions in any non-hugetlb, non-mlock()'d mapping. > > Hi Lorenzo, > > Thank you for your work on this. > > Have we thought about how guard regions are represented in /proc/*/[s]maps? > > In the field, I've found that many applications read the ranges from > /proc/self/[s]maps to determine what they can access (usually related > to obfuscation techniques). If they don't know of the guard regions it > would cause them to crash; I think that we'll need similar entries to > PROT_NONE (---p) for these, and generally to maintain consistency > between the behavior and what is being said from /proc/*/[s]maps. To clarify why the applications may not be aware of their guard regions -- in the case of the ELF mappings these PROT_NONE (guard regions) would be installed by the dynamic loader; or may be inherited from the parent (zygote in Android's case). > > -- Kalesh > > > > > It is permissible to permit the establishment of guard regions in read-only > > mappings because the guard regions only reduce access to the mapping, and > > when removed simply reinstate the existing attributes of the underlying > > VMA, meaning no access violations can occur. > > > > While the change in kernel code introduced in this series is small, the > > majority of the effort here is spent in extending the testing to assert > > that the feature works correctly across numerous file-backed mapping > > scenarios. > > > > Every single guard region self-test performed against anonymous memory > > (which is relevant and not anon-only) has now been updated to also be > > performed against shmem and a mapping of a file in the working directory. > > > > This confirms that all cases also function correctly for file-backed guard > > regions. > > > > In addition a number of other tests are added for specific file-backed > > mapping scenarios. > > > > There are a number of other concerns that one might have with regard to > > guard regions, addressed below: > > > > Readahead > > ~~~~~~~~~ > > > > Readahead is a process through which the page cache is populated on the > > assumption that sequential reads will occur, thus amortising I/O and, > > through a clever use of the PG_readahead folio flag establishing during > > major fault and checked upon minor fault, provides for asynchronous I/O to > > occur as dat is processed, reducing I/O stalls as data is faulted in. > > > > Guard regions do not alter this mechanism which operations at the folio and > > fault level, but do of course prevent the faulting of folios that would > > otherwise be mapped. > > > > In the instance of a major fault prior to a guard region, synchronous > > readahead will occur including populating folios in the page cache which > > the guard regions will, in the case of the mapping in question, prevent > > access to. > > > > In addition, if PG_readahead is placed in a folio that is now inaccessible, > > this will prevent asynchronous readahead from occurring as it would > > otherwise do. > > > > However, there are mechanisms for heuristically resetting this within > > readahead regardless, which will 'recover' correct readahead behaviour. > > > > Readahead presumes sequential data access, the presence of a guard region > > clearly indicates that, at least in the guard region, no such sequential > > access will occur, as it cannot occur there. > > > > So this should have very little impact on any real workload. The far more > > important point is as to whether readahead causes incorrect or > > inappropriate mapping of ranges disallowed by the presence of guard > > regions - this is not the case, as readahead does not 'pre-fault' memory in > > this fashion. > > > > At any rate, any mechanism which would attempt to do so would hit the usual > > page fault paths, which correctly handle PTE markers as with anonymous > > mappings. > > > > Fault-Around > > ~~~~~~~~~~~~ > > > > The fault-around logic, in a similar vein to readahead, attempts to improve > > efficiency with regard to file-backed memory mappings, however it differs > > in that it does not try to fetch folios into the page cache that are about > > to be accessed, but rather pre-maps a range of folios around the faulting > > address. > > > > Guard regions making use of PTE markers makes this relatively trivial, as > > this case is already handled - see filemap_map_folio_range() and > > filemap_map_order0_folio() - in both instances, the solution is to simply > > keep the established page table mappings and let the fault handler take > > care of PTE markers, as per the comment: > > > > /* > > * NOTE: If there're PTE markers, we'll leave them to be > > * handled in the specific fault path, and it'll prohibit > > * the fault-around logic. > > */ > > > > This works, as establishing guard regions results in page table mappings > > with PTE markers, and clearing them removes them. > > > > Truncation > > ~~~~~~~~~~ > > > > File truncation will not eliminate existing guard regions, as the > > truncation operation will ultimately zap the range via > > unmap_mapping_range(), which specifically excludes PTE markers. > > > > Zapping > > ~~~~~~~ > > > > Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(), > > which specifically deals with guard entries, leaving them intact except in > > instances such as process teardown or munmap() where they need to be > > removed. > > > > Reclaim > > ~~~~~~~ > > > > When reclaim is performed on file-backed folios, it ultimately invokes > > try_to_unmap_one() via the rmap. If the folio is non-large, then map_pte() > > will ultimately abort the operation for the guard region mapping. If large, > > then check_pte() will determine that this is a non-device private > > entry/device-exclusive entry 'swap' PTE and thus abort the operation in > > that instance. > > > > Therefore, no odd things happen in the instance of reclaim being attempted > > upon a file-backed guard region. > > > > Hole Punching > > ~~~~~~~~~~~~~ > > > > This updates the page cache and ultimately invokes unmap_mapping_range(), > > which explicitly leaves PTE markers in place. > > > > Because the establishment of guard regions zapped any existing mappings to > > file-backed folios, once the guard regions are removed then the > > hole-punched region will be faulted in as usual and everything will behave > > as expected. > > > > Lorenzo Stoakes (4): > > mm: allow guard regions in file-backed and read-only mappings > > selftests/mm: rename guard-pages to guard-regions > > tools/selftests: expand all guard region tests to file-backed > > tools/selftests: add file/shmem-backed mapping guard region tests > > > > mm/madvise.c | 8 +- > > tools/testing/selftests/mm/.gitignore | 2 +- > > tools/testing/selftests/mm/Makefile | 2 +- > > .../mm/{guard-pages.c => guard-regions.c} | 921 ++++++++++++++++-- > > 4 files changed, 821 insertions(+), 112 deletions(-) > > rename tools/testing/selftests/mm/{guard-pages.c => guard-regions.c} (58%) > > > > -- > > 2.48.1