Re: [PATCH -next] ovl: respect underlying filesystem's get_unmapped_area()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To be clear - I'm not accepting the export of __get_unmapped_area() so if
you depend on this for this approach, you can't take this approach.

It's an internal implementation detail. That you choose to make your
filesystem possibly a module doesn't mean that mm is required to export
internal impl details to you. Sorry.

To rescind this would require a very strong argument, you have not provided
it.

On Fri, Dec 06, 2024 at 11:35:08AM +0800, Jinjiang Tu wrote:
>
> 在 2024/12/5 23:04, Lorenzo Stoakes 写道:
> > + Matthew for large folio aspect
> >
> > On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote:
> > > During our tests in containers, there is a read-only file (i.e., shared
> > > libraies) in the overlayfs filesystem, and the underlying filesystem is
> > > ext4, which supports large folio. We mmap the file with PROT_READ prot,
> > > and then call madvise(MADV_COLLAPSE) for it. However, the madvise call
> > > fails and returns EINVAL.
> > >
> > > The reason is that the mapping address isn't aligned to PMD size. Since
> > > overlayfs doesn't support large folio, __get_unmapped_area() doesn't call
> > > thp_get_unmapped_area() to get a THP aligned address.
> > >
> > > To fix it, call get_unmapped_area() with the realfile.
> > Isn't the correct solution to get overlayfs to support large folios?
> >
> > > Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should
> > > export get_unmapped_area().
> > Yeah, not in favour of this at all. This is an internal implementation
> > detail. It seems like you're trying to hack your way into avoiding
> > providing support for large folios and to hand it off to the underlying
> > file system.
> >
> > Again, why don't you just support large folios in overlayfs?
> >
> > Literally no other file system or driver appears to make use of this
> > directly in this manner.
> >
> > And there's absolutely no way this should be exported non-GPL as if it were
> > unavoidable core functionality that everyone needs. Only you seem to...
> >
> > > Signed-off-by: Jinjiang Tu <tujinjiang@xxxxxxxxxx>
> > > ---
> > >   fs/overlayfs/file.c | 20 ++++++++++++++++++++
> > >   mm/mmap.c           |  1 +
> > >   2 files changed, 21 insertions(+)
> > >
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index 969b458100fe..d0dcf675ebe8 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -653,6 +653,25 @@ static int ovl_flush(struct file *file, fl_owner_t id)
> > >   	return err;
> > >   }
> > >
> > > +static unsigned long ovl_get_unmapped_area(struct file *file,
> > > +		unsigned long addr, unsigned long len, unsigned long pgoff,
> > > +		unsigned long flags)
> > > +{
> > > +	struct file *realfile;
> > > +	const struct cred *old_cred;
> > > +	unsigned long ret;
> > > +
> > > +	realfile = ovl_real_file(file);
> > > +	if (IS_ERR(realfile))
> > > +		return PTR_ERR(realfile);
> > > +
> > > +	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > > +	ret = get_unmapped_area(realfile, addr, len, pgoff, flags);
> > > +	ovl_revert_creds(old_cred);
> > Why are you overriding credentials, then reinstating them here? That
> > seems... iffy? I knew nothing about overlayfs so this may just be a
> > misunderstanding...
>
> I refer to other file operations in overlayfs (i.e., ovl_fallocate, backing_file_mmap).
> Since get_unmapped_area() has security related operations (e.g., security_mmap_addr()),
> We should call it with the cred of the underlying file.
>
> >
> > > +
> > > +	return ret;
> > > +}
> > > +
> > >   const struct file_operations ovl_file_operations = {
> > >   	.open		= ovl_open,
> > >   	.release	= ovl_release,
> > > @@ -661,6 +680,7 @@ const struct file_operations ovl_file_operations = {
> > >   	.write_iter	= ovl_write_iter,
> > >   	.fsync		= ovl_fsync,
> > >   	.mmap		= ovl_mmap,
> > > +	.get_unmapped_area = ovl_get_unmapped_area,
> > >   	.fallocate	= ovl_fallocate,
> > >   	.fadvise	= ovl_fadvise,
> > >   	.flush		= ovl_flush,
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index 16f8e8be01f8..60eb1ff7c9a8 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -913,6 +913,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> > >   	error = security_mmap_addr(addr);
> > >   	return error ? error : addr;
> > >   }
> > > +EXPORT_SYMBOL(__get_unmapped_area);
> > We'll need a VERY good reason to export this internal implementation
> > detail, and if that were provided we'd need a VERY good reason for it not
> > to be GPL.
> >
> > This just seems to be a cheap way of invoking (),
> > maybe, if it is being used by the underlying file system.
>
> But the underlying file system may not support large folio. In this case,
> the mmap address doesn't need to be aligned with THP size.

But it'd not cause any problems to just do that anyway right? I don't think
many people think 'oh no I have a PMD aligned mapping now what will I do'?

Again - the right solution here is to handle large folios in overlayfs as
far as I can tell.

In any case as per the above, we're just not exporting
__get_unmapped_area(), sorry.

>
> >
> > And again... why not just add large folio support? We can't just take a
> > hack here.
> >
> > >   unsigned long
> > >   mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
> > > --
> > > 2.34.1
> > >




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux