Hi organizers of LSF/MM, I realize this is a late submission, but I was hoping there might still be a chance to have this topic considered for discussion. Problem Statement =============== Readahead can result in unnecessary page cache pollution for mapped regions that are never accessed. Current mechanisms to disable readahead lack granularity and rather operate at the file or VMA level. This proposal seeks to initiate discussion at LSFMM to explore potential solutions for optimizing page cache/readahead behavior. Background ========= The read-ahead heuristics on file-backed memory mappings can inadvertently populate the page cache with pages corresponding to regions that user-space processes are known never to access e.g ELF LOAD segment padding regions. While these pages are ultimately reclaimable, their presence precipitates unnecessary I/O operations, particularly when a substantial quantity of such regions exists. Although the underlying file can be made sparse in these regions to mitigate I/O, readahead will still allocate discrete zero pages when populating the page cache within these ranges. These pages, while subject to reclaim, introduce additional churn to the LRU. This reclaim overhead is further exacerbated in filesystems that support "fault-around" semantics, that can populate the surrounding pages’ PTEs if found present in the page cache. While the memory impact may be negligible for large files containing a limited number of sparse regions, it becomes appreciable for many small mappings characterized by numerous holes. This scenario can arise from efforts to minimize vm_area_struct slab memory footprint. Limitations of Existing Mechanisms =========================== fadvise(..., POSIX_FADV_RANDOM, ...): disables read-ahead for the entire file, rather than specific sub-regions. The offset and length parameters primarily serve the POSIX_FADV_WILLNEED [1] and POSIX_FADV_DONTNEED [2] cases. madvise(..., MADV_RANDOM, ...): Similarly, this applies on the entire VMA, rather than specific sub-regions. [3] Guard Regions: While guard regions for file-backed VMAs circumvent fault-around concerns, the fundamental issue of unnecessary page cache population persists. [4] Empirical Demonstration =================== Below is a simple program to demonstrate the issue. Assume that the last 20 pages of the mapping is a region known to never be accessed (perhaps a guard region). cachestat is a simple C program I wrote that returns the nr_cached for the entire file using the new cachestat() syscall [5]. cat pollute_page_cache.sh #!/bin/bash FILE="myfile.txt" echo "Creating sparse file of size 25 pages" truncate -s 100k $FILE apparent_size=$(ls -lahs $FILE | awk '{ print $6 }') echo "Apparent Size: $apparent_size" real_size=$(ls -lahs $FILE | awk '{ print $1 }') echo "Real Size: $real_size" nr_cached=$(./cachestat $FILE | grep nr_cache: | awk '{ print $2 }') echo "Number cached pages: $nr_cached" echo "Reading first 5 pages..." head -c 20k $FILE nr_cached=$(./cachestat $FILE | grep nr_cache: | awk '{ print $2 }') echo "Number cached pages: $nr_cached" rm $FILE ------- ./pollute_page_cache.sh Creating sparse file of size 25 pages Apparent Size: 100K Real Size: 0 Number cached pages: 0 Reading first 5 pages... Number cached pages: 25 Thanks, Kalesh [1] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/fadvise.c#L96 [2] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/fadvise.c#L113 [3] https://github.com/torvalds/linux/blob/v6.14-rc3/mm/madvise.c#L1277 [4] https://lore.kernel.org/r/cover.1739469950.git.lorenzo.stoakes@xxxxxxxxxx/ [5] https://lore.kernel.org/r/20230503013608.2431726-3-nphamcs@xxxxxxxxx/