On Thu, Sep 28, 2023 at 10:55:17AM +0100, Ryan Roberts wrote: > Hi all, > > I've just noticed that when applied to a file mapping for a file on xfs, MADV_COLLAPSE returns EINVAL. The same test case works fine if the file is on ext4. > > I think the root cause is that the implementation bails out if it finds a (non-PMD-sized) large folio in the page cache for any part of the file covered by the region. XFS does readahead into large folios so we hit this issue. See khugepaged.h:collapse_file(): > > if (PageTransCompound(page)) { > struct page *head = compound_head(page); > > result = compound_order(head) == HPAGE_PMD_ORDER && > head->index == start > /* Maybe PMD-mapped */ > ? SCAN_PTE_MAPPED_HUGEPAGE > : SCAN_PAGE_COMPOUND; > goto out_unlock; > } I don't see any hint to -EINVAL above. Am I missing something? > > I'm not sure if this is already a known issue? I don't have time to work on a fix for this right now, so thought I would highlight it at least. I might get around to it at some point in the future if nobody else tackles it. > > Thanks, > Ryan > > > Test case I've been using: > > -->8-- > > #include <stdio.h> > #include <stdlib.h> > #include <sys/mman.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <unistd.h> > > #ifndef MADV_COLLAPSE > #define MADV_COLLAPSE 25 > #endif > > #define handle_error(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) > > #define SZ_1K 1024 > #define SZ_1M (SZ_1K * SZ_1K) > #define ALIGN(val, align) (((val) + ((align) - 1)) & ~((align) - 1)) > > #if 1 > // ext4 > #define DATA_FILE "/home/ubuntu/data.txt" > #else > // xfs > #define DATA_FILE "/boot/data.txt" > #endif > > int main(void) > { > int fd; > char *mem; > int ret; > > fd = open(DATA_FILE, O_RDONLY); > if (fd == -1) > handle_error("open"); > > mem = mmap(NULL, SZ_1M * 4, PROT_READ | PROT_EXEC, MAP_PRIVATE, fd, 0); > close(fd); > if (mem == MAP_FAILED) > handle_error("mmap"); > > printf("1: pid=%d, mem=%p\n", getpid(), mem); > getchar(); > > mem = (char *)ALIGN((unsigned long)mem, SZ_1M * 2); > ret = madvise(mem, SZ_1M * 2, MADV_COLLAPSE); > if (ret) > handle_error("madvise"); > > printf("2: pid=%d, mem=%p\n", getpid(), mem); > getchar(); > > return 0; > } > > -->8-- > Confused... -- An old man doll... just what I always wanted! - Clara
Attachment:
signature.asc
Description: PGP signature