(fixing typo in cc list: tujinjiang@xxxxxxxxx -> tujinjiang@xxxxxxxxxx) + Liam (JinJiang - you forgot to cc the correct maintainers, please ensure you run scripts/get_maintainers.pl on files you change) On Thu, Dec 05, 2024 at 04:12:12PM +0100, Amir Goldstein wrote: > On Thu, Dec 5, 2024 at 4:04 PM Lorenzo Stoakes > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > + Matthew for large folio aspect > > > > On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote: > > > During our tests in containers, there is a read-only file (i.e., shared > > > libraies) in the overlayfs filesystem, and the underlying filesystem is > > > ext4, which supports large folio. We mmap the file with PROT_READ prot, > > > and then call madvise(MADV_COLLAPSE) for it. However, the madvise call > > > fails and returns EINVAL. > > > > > > The reason is that the mapping address isn't aligned to PMD size. Since > > > overlayfs doesn't support large folio, __get_unmapped_area() doesn't call > > > thp_get_unmapped_area() to get a THP aligned address. > > > > > > To fix it, call get_unmapped_area() with the realfile. > > > > Isn't the correct solution to get overlayfs to support large folios? > > > > > > > > Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should > > > export get_unmapped_area(). > > > > Yeah, not in favour of this at all. This is an internal implementation > > detail. It seems like you're trying to hack your way into avoiding > > providing support for large folios and to hand it off to the underlying > > file system. > > > > Again, why don't you just support large folios in overlayfs? > > > > This whole discussion seems moot. > overlayfs does not have address_space operations > It does not have its own page cache. And here we see my total lack of knowledge of overlayfs coming into play here :) Thanks for pointing this out. In that case, I object even further to the original of course... > > The file in vma->vm_file is not an overlayfs file at all - it is the > real (e.g. ext4) file > when returning from ovl_mmap() => backing_file_mmap() > so I have very little clue why the proposed solution even works, > but it certainly does not look correct. I think then Jinjiang in this cause you ought to go back to the drawing board and reconsider what might be the underlying issue here. > > Thanks, > Amir. Cheers, Lorenzo