+ Matthew for large folio aspect On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote: > During our tests in containers, there is a read-only file (i.e., shared > libraies) in the overlayfs filesystem, and the underlying filesystem is > ext4, which supports large folio. We mmap the file with PROT_READ prot, > and then call madvise(MADV_COLLAPSE) for it. However, the madvise call > fails and returns EINVAL. > > The reason is that the mapping address isn't aligned to PMD size. Since > overlayfs doesn't support large folio, __get_unmapped_area() doesn't call > thp_get_unmapped_area() to get a THP aligned address. > > To fix it, call get_unmapped_area() with the realfile. Isn't the correct solution to get overlayfs to support large folios? > > Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should > export get_unmapped_area(). Yeah, not in favour of this at all. This is an internal implementation detail. It seems like you're trying to hack your way into avoiding providing support for large folios and to hand it off to the underlying file system. Again, why don't you just support large folios in overlayfs? Literally no other file system or driver appears to make use of this directly in this manner. And there's absolutely no way this should be exported non-GPL as if it were unavoidable core functionality that everyone needs. Only you seem to... > > Signed-off-by: Jinjiang Tu <tujinjiang@xxxxxxxxxx> > --- > fs/overlayfs/file.c | 20 ++++++++++++++++++++ > mm/mmap.c | 1 + > 2 files changed, 21 insertions(+) > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > index 969b458100fe..d0dcf675ebe8 100644 > --- a/fs/overlayfs/file.c > +++ b/fs/overlayfs/file.c > @@ -653,6 +653,25 @@ static int ovl_flush(struct file *file, fl_owner_t id) > return err; > } > > +static unsigned long ovl_get_unmapped_area(struct file *file, > + unsigned long addr, unsigned long len, unsigned long pgoff, > + unsigned long flags) > +{ > + struct file *realfile; > + const struct cred *old_cred; > + unsigned long ret; > + > + realfile = ovl_real_file(file); > + if (IS_ERR(realfile)) > + return PTR_ERR(realfile); > + > + old_cred = ovl_override_creds(file_inode(file)->i_sb); > + ret = get_unmapped_area(realfile, addr, len, pgoff, flags); > + ovl_revert_creds(old_cred); Why are you overriding credentials, then reinstating them here? That seems... iffy? I knew nothing about overlayfs so this may just be a misunderstanding... > + > + return ret; > +} > + > const struct file_operations ovl_file_operations = { > .open = ovl_open, > .release = ovl_release, > @@ -661,6 +680,7 @@ const struct file_operations ovl_file_operations = { > .write_iter = ovl_write_iter, > .fsync = ovl_fsync, > .mmap = ovl_mmap, > + .get_unmapped_area = ovl_get_unmapped_area, > .fallocate = ovl_fallocate, > .fadvise = ovl_fadvise, > .flush = ovl_flush, > diff --git a/mm/mmap.c b/mm/mmap.c > index 16f8e8be01f8..60eb1ff7c9a8 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -913,6 +913,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, > error = security_mmap_addr(addr); > return error ? error : addr; > } > +EXPORT_SYMBOL(__get_unmapped_area); We'll need a VERY good reason to export this internal implementation detail, and if that were provided we'd need a VERY good reason for it not to be GPL. This just seems to be a cheap way of invoking thp_get_unmapped_area(), maybe, if it is being used by the underlying file system. And again... why not just add large folio support? We can't just take a hack here. > > unsigned long > mm_get_unmapped_area(struct mm_struct *mm, struct file *file, > -- > 2.34.1 >