在 2024/12/5 23:04, Lorenzo Stoakes 写道:
+ Matthew for large folio aspect
On Thu, Dec 05, 2024 at 10:30:38PM +0800, Jinjiang Tu wrote:
During our tests in containers, there is a read-only file (i.e., shared
libraies) in the overlayfs filesystem, and the underlying filesystem is
ext4, which supports large folio. We mmap the file with PROT_READ prot,
and then call madvise(MADV_COLLAPSE) for it. However, the madvise call
fails and returns EINVAL.
The reason is that the mapping address isn't aligned to PMD size. Since
overlayfs doesn't support large folio, __get_unmapped_area() doesn't call
thp_get_unmapped_area() to get a THP aligned address.
To fix it, call get_unmapped_area() with the realfile.
Isn't the correct solution to get overlayfs to support large folios?
Besides, since overlayfs may be built with CONFIG_OVERLAY_FS=m, we should
export get_unmapped_area().
Yeah, not in favour of this at all. This is an internal implementation
detail. It seems like you're trying to hack your way into avoiding
providing support for large folios and to hand it off to the underlying
file system.
Again, why don't you just support large folios in overlayfs?
Literally no other file system or driver appears to make use of this
directly in this manner.
And there's absolutely no way this should be exported non-GPL as if it were
unavoidable core functionality that everyone needs. Only you seem to...
Signed-off-by: Jinjiang Tu <tujinjiang@xxxxxxxxxx>
---
fs/overlayfs/file.c | 20 ++++++++++++++++++++
mm/mmap.c | 1 +
2 files changed, 21 insertions(+)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 969b458100fe..d0dcf675ebe8 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -653,6 +653,25 @@ static int ovl_flush(struct file *file, fl_owner_t id)
return err;
}
+static unsigned long ovl_get_unmapped_area(struct file *file,
+ unsigned long addr, unsigned long len, unsigned long pgoff,
+ unsigned long flags)
+{
+ struct file *realfile;
+ const struct cred *old_cred;
+ unsigned long ret;
+
+ realfile = ovl_real_file(file);
+ if (IS_ERR(realfile))
+ return PTR_ERR(realfile);
+
+ old_cred = ovl_override_creds(file_inode(file)->i_sb);
+ ret = get_unmapped_area(realfile, addr, len, pgoff, flags);
+ ovl_revert_creds(old_cred);
Why are you overriding credentials, then reinstating them here? That
seems... iffy? I knew nothing about overlayfs so this may just be a
misunderstanding...
I refer to other file operations in overlayfs (i.e., ovl_fallocate, backing_file_mmap).
Since get_unmapped_area() has security related operations (e.g., security_mmap_addr()),
We should call it with the cred of the underlying file.
+
+ return ret;
+}
+
const struct file_operations ovl_file_operations = {
.open = ovl_open,
.release = ovl_release,
@@ -661,6 +680,7 @@ const struct file_operations ovl_file_operations = {
.write_iter = ovl_write_iter,
.fsync = ovl_fsync,
.mmap = ovl_mmap,
+ .get_unmapped_area = ovl_get_unmapped_area,
.fallocate = ovl_fallocate,
.fadvise = ovl_fadvise,
.flush = ovl_flush,
diff --git a/mm/mmap.c b/mm/mmap.c
index 16f8e8be01f8..60eb1ff7c9a8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -913,6 +913,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
error = security_mmap_addr(addr);
return error ? error : addr;
}
+EXPORT_SYMBOL(__get_unmapped_area);
We'll need a VERY good reason to export this internal implementation
detail, and if that were provided we'd need a VERY good reason for it not
to be GPL.
This just seems to be a cheap way of invoking (),
maybe, if it is being used by the underlying file system.
But the underlying file system may not support large folio. In this case,
the mmap address doesn't need to be aligned with THP size.
And again... why not just add large folio support? We can't just take a
hack here.
unsigned long
mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
--
2.34.1