To be clear - I'm not accepting the export of __get_unmapped_area() so if you depend on this for this approach, you can't take this approach. It's an internal implementation detail. That you choose to make your filesystem possibly a module doesn't mean that mm is required to export internal impl details to you. Sorry. To rescind this would require a very strong argument, you have not provided it. On Fri, Dec 06, 2024 at 11:35:08AM +0800, Jinjiang Tu wrote: > > 在 2024/12/5 23:04, Lorenzo Stoakes 写道: > > + Matthew for large folio aspect > > > > On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote: > > > During our tests in containers, there is a read-only file (i.e., shared > > > libraies) in the overlayfs filesystem, and the underlying filesystem is > > > ext4, which supports large folio. We mmap the file with PROT_READ prot, > > > and then call madvise(MADV_COLLAPSE) for it. However, the madvise call > > > fails and returns EINVAL. > > > > > > The reason is that the mapping address isn't aligned to PMD size. Since > > > overlayfs doesn't support large folio, __get_unmapped_area() doesn't call > > > thp_get_unmapped_area() to get a THP aligned address. > > > > > > To fix it, call get_unmapped_area() with the realfile. > > Isn't the correct solution to get overlayfs to support large folios? > > > > > Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should > > > export get_unmapped_area(). > > Yeah, not in favour of this at all. This is an internal implementation > > detail. It seems like you're trying to hack your way into avoiding > > providing support for large folios and to hand it off to the underlying > > file system. > > > > Again, why don't you just support large folios in overlayfs? > > > > Literally no other file system or driver appears to make use of this > > directly in this manner. > > > > And there's absolutely no way this should be exported non-GPL as if it were > > unavoidable core functionality that everyone needs. Only you seem to... > > > > > Signed-off-by: Jinjiang Tu <tujinjiang@xxxxxxxxxx> > > > --- > > > fs/overlayfs/file.c | 20 ++++++++++++++++++++ > > > mm/mmap.c | 1 + > > > 2 files changed, 21 insertions(+) > > > > > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c > > > index 969b458100fe..d0dcf675ebe8 100644 > > > --- a/fs/overlayfs/file.c > > > +++ b/fs/overlayfs/file.c > > > @@ -653,6 +653,25 @@ static int ovl_flush(struct file *file, fl_owner_t id) > > > return err; > > > } > > > > > > +static unsigned long ovl_get_unmapped_area(struct file *file, > > > + unsigned long addr, unsigned long len, unsigned long pgoff, > > > + unsigned long flags) > > > +{ > > > + struct file *realfile; > > > + const struct cred *old_cred; > > > + unsigned long ret; > > > + > > > + realfile = ovl_real_file(file); > > > + if (IS_ERR(realfile)) > > > + return PTR_ERR(realfile); > > > + > > > + old_cred = ovl_override_creds(file_inode(file)->i_sb); > > > + ret = get_unmapped_area(realfile, addr, len, pgoff, flags); > > > + ovl_revert_creds(old_cred); > > Why are you overriding credentials, then reinstating them here? That > > seems... iffy? I knew nothing about overlayfs so this may just be a > > misunderstanding... > > I refer to other file operations in overlayfs (i.e., ovl_fallocate, backing_file_mmap). > Since get_unmapped_area() has security related operations (e.g., security_mmap_addr()), > We should call it with the cred of the underlying file. > > > > > > + > > > + return ret; > > > +} > > > + > > > const struct file_operations ovl_file_operations = { > > > .open = ovl_open, > > > .release = ovl_release, > > > @@ -661,6 +680,7 @@ const struct file_operations ovl_file_operations = { > > > .write_iter = ovl_write_iter, > > > .fsync = ovl_fsync, > > > .mmap = ovl_mmap, > > > + .get_unmapped_area = ovl_get_unmapped_area, > > > .fallocate = ovl_fallocate, > > > .fadvise = ovl_fadvise, > > > .flush = ovl_flush, > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 16f8e8be01f8..60eb1ff7c9a8 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -913,6 +913,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, > > > error = security_mmap_addr(addr); > > > return error ? error : addr; > > > } > > > +EXPORT_SYMBOL(__get_unmapped_area); > > We'll need a VERY good reason to export this internal implementation > > detail, and if that were provided we'd need a VERY good reason for it not > > to be GPL. > > > > This just seems to be a cheap way of invoking (), > > maybe, if it is being used by the underlying file system. > > But the underlying file system may not support large folio. In this case, > the mmap address doesn't need to be aligned with THP size. But it'd not cause any problems to just do that anyway right? I don't think many people think 'oh no I have a PMD aligned mapping now what will I do'? Again - the right solution here is to handle large folios in overlayfs as far as I can tell. In any case as per the above, we're just not exporting __get_unmapped_area(), sorry. > > > > > And again... why not just add large folio support? We can't just take a > > hack here. > > > > > unsigned long > > > mm_get_unmapped_area(struct mm_struct *mm, struct file *file, > > > -- > > > 2.34.1 > > >