When testing large folio support with XFS on our servers, we observed that only a few large folios are mapped when reading large files via mmap. After a thorough analysis, I identified it was caused by the `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this parameter is set to 128KB. After I tune it to 2MB, the large folio can work as expected. However, I believe the large folio behavior should not be dependent on the value of read_ahead_kb. It would be more robust if the kernel can automatically adopt to it. With `/sys/block/*/queue/read_ahead_kb` set to a non-2MB aligned size, this issue can be verified with a simple test case, as shown below: #define LEN (1024 * 1024 * 1024) // 1GB file int main(int argc, char *argv[]) { char *addr; int fd, i; fd = open("data", O_RDWR); if (fd < 0) { perror("open"); exit(-1); } addr = mmap(NULL, LEN, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(-1); } if (madvise(addr, LEN, MADV_HUGEPAGE)) { perror("madvise"); exit(-1); } for (i = 0; i < LEN / 4096; i++) memset(addr + i * 4096, 1, 1); while (1) {} // Verifiable with /proc/meminfo munmap(addr, LEN); close(fd); exit(0); } When large folio support is enabled and read_ahead_kb is set to a smaller value, ra->size (4MB) may exceed the maximum allowed size (e.g., 128KB). To address this, we need to add a conditional check for such cases. However, this alone is insufficient, as users might set read_ahead_kb to a larger, non-hugepage-aligned value (e.g., 4MB + 128KB). In these instances, it is essential to explicitly align ra->size with the hugepage size. Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> --- mm/readahead.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Changes: RFC->v1: - Simplify the code as suggested by Matthew RFC: https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao@xxxxxxxxx/ diff --git a/mm/readahead.c b/mm/readahead.c index 3dc6c7a128dd..9e2c6168ebfa 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -385,6 +385,8 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra, return 4 * cur; if (cur <= max / 2) return 2 * cur; + if (cur > max) + return cur; return max; } @@ -642,7 +644,7 @@ void page_cache_async_ra(struct readahead_control *ractl, 1UL << order); if (index == expected) { ra->start += ra->size; - ra->size = get_next_ra_size(ra, max_pages); + ra->size = ALIGN(get_next_ra_size(ra, max_pages), 1 << order); ra->async_size = ra->size; goto readit; } -- 2.43.5