在 2024/12/5 23:24, Lorenzo Stoakes 写道:
(fixing typo in cc list: tujinjiang@xxxxxxxxx -> tujinjiang@xxxxxxxxxx)
+ Liam
(JinJiang - you forgot to cc the correct maintainers, please ensure you run
scripts/get_maintainers.pl on files you change)
On Thu, Dec 05, 2024 at 04:12:12PM +0100, Amir Goldstein wrote:
On Thu, Dec 5, 2024 at 4:04 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
+ Matthew for large folio aspect
On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote:
During our tests in containers, there is a read-only file (i.e., shared
libraies) in the overlayfs filesystem, and the underlying filesystem is
ext4, which supports large folio. We mmap the file with PROT_READ prot,
and then call madvise(MADV_COLLAPSE) for it. However, the madvise call
fails and returns EINVAL.
The reason is that the mapping address isn't aligned to PMD size. Since
overlayfs doesn't support large folio, __get_unmapped_area() doesn't call
thp_get_unmapped_area() to get a THP aligned address.
To fix it, call get_unmapped_area() with the realfile.
Isn't the correct solution to get overlayfs to support large folios?
Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should
export get_unmapped_area().
Yeah, not in favour of this at all. This is an internal implementation
detail. It seems like you're trying to hack your way into avoiding
providing support for large folios and to hand it off to the underlying
file system.
Again, why don't you just support large folios in overlayfs?
This whole discussion seems moot.
overlayfs does not have address_space operations
It does not have its own page cache.
And here we see my total lack of knowledge of overlayfs coming into play
here :) Thanks for pointing this out.
In that case, I object even further to the original of course...
The file in vma->vm_file is not an overlayfs file at all - it is the
real (e.g. ext4) file
when returning from ovl_mmap() => backing_file_mmap()
so I have very little clue why the proposed solution even works,
but it certainly does not look correct.
I think then Jinjiang in this cause you ought to go back to the drawing
board and reconsider what might be the underlying issue here.
When usespace calls mmap syscall, the call trace is as follows:
do_mmap
__get_unmapped_area
mmap_region
mmap_file
ovl_mmap
__get_unmapped_area() gets the address to mmap at, the file here is an overlayfs file.
Since ovl_file_operations doesn't defines get_unmapped_area callback, __get_unmapped_area()
fallbacks to mm_get_unmapped_area_vmflags(), and it doesn't return an address aligned to
large folio size.
Thanks,
Amir.
Cheers, Lorenzo