Re: [PATCH] fs: support cross-type copy_file_range in overlayfs.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 23, 2025 at 9:18 AM Han-Wen Nienhuys <hanwen@xxxxxxxxxxx> wrote:
>
> Introduces the FOP_CROSS_TYPE_COPYFILE flag, and implements it for
> overlayfs. This enables copy_file_range between an overlayfs mount and
> its constituent layers.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@xxxxxxxxxxx>
> ---

>
> Whoops, I should have sent a cover letter.

FYI, with a single patch, you can write the "cover letter" in the patch itself
after the --- line, but with my suggestion below, you should probably split
this to one vfs patch to change copy_file_range() interface and another
patch to implement ovl_copy_file_range() support.

>
> I thought this could speed a lot of operations in docker/podman.

What sort of operations are we talking about?
The only relevant use case as far as I can tell is copying
from a shared lower fs directly into overlayfs, but why is this useful?
to which scenario?
Do you have an example of use cases for copy to another direction?

This should be included in the cover letter as well as performance
speed figures if the improvement is dubbed a performance improvement.

>
> This is my first time patching the kernel, so please be gentle.

ok :)

The change that you are proposing is not constrained to overlayfs.
The FOP_CROSS_TYPE_COPYFILE change is a vfs change,
so I added linux-fsdevel list to CC (please do so for follow up revisions
i.e. PATCH v2...).

Also added to CC the developers Dave Chinner and Darrick J. Wong
who were actively involved in shaping the copy_file_range()
vfs API to the way that it is right now.

> I don't really know what I'm doing, but it seems to work. I tested this
> manually using qemu; is there an official test suite where I could add
> a test?

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
See: README.overlay
Try ./check -overlay -g copy_range

and specifically:
$ git grep copy_file_range.*cross
tests/generic/565:# Exercise copy_file_range() across devices supported by some
tests/generic/565:# 5dae222a5ff0 vfs: allow copy_file_range to copy
across devices

tests/generic are not doing overlayfs specific operations and never have code
that is aware of internals like $OVL_BASE_SCRATCH_MNT, they simply run
on top of whatever filesystems and on top of overlayfs with -overlayfs run flag.

For testing your change, you will need a tests/overlay specific copy_range test.
Let's talk about that if that becomes relevant.

More comments to the point inline

>  fs/overlayfs/file.c | 60 ++++++++++++++++++++++++++++++---------------
>  fs/read_write.c     | 15 ++++++++----
>  include/linux/fs.h  |  4 +++
>  3 files changed, 54 insertions(+), 25 deletions(-)
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 969b458100fe..97b394737251 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -536,6 +536,9 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
>         return ret;
>  }
>
> +static ssize_t ovl_copy_file_range(struct file *file_in, loff_t pos_in,
> +                                  struct file *file_out, loff_t pos_out,
> +                                  size_t len, unsigned int flags);
>  enum ovl_copyop {
>         OVL_COPY,
>         OVL_CLONE,
> @@ -547,30 +550,42 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
>                             loff_t len, unsigned int flags, enum ovl_copyop op)
>  {
>         struct inode *inode_out = file_inode(file_out);
> -       struct file *realfile_in, *realfile_out;
> +       struct file *realfile_in = file_in;
> +       struct file *realfile_out = file_out;
>         const struct cred *old_cred;
>         loff_t ret;
> +       bool in_overlay = file_in->f_op->copy_file_range == &ovl_copy_file_range;
> +       bool out_overlay = file_out->f_op->copy_file_range == &ovl_copy_file_range;

I prefer if we did not add the support for copying out of overlayfs
unless there is a *very* good reason to do so and then
out_overlay is not needed and all the conditions below not needed.
It is always true, because of how ->copy_file_range() is called (see below)

The only case that you could improve is copy from a file in the sb of upper fs
to the overlayfs sb. Anything else should return -EXDEV.

Note that cross-sb copy_file_range() supported by cifs_copy_file_range()
and nfs4_copy_file_range() is quite close to what you are trying to do here
I will propose below a way to unify those cases.


>
> -       inode_lock(inode_out);
> -       if (op != OVL_DEDUPE) {
> -               /* Update mode */
> -               ovl_copyattr(inode_out);
> -               ret = file_remove_privs(file_out);
> -               if (ret)
> -                       goto out_unlock;
> +       if (WARN_ON_ONCE(!in_overlay && !out_overlay))
> +               return -EXDEV;
> +
> +       if (in_overlay) {
> +               realfile_in = ovl_real_file(file_in);
> +               ret = PTR_ERR(realfile_in);
> +               if (IS_ERR(realfile_in))
> +                       return ret;
>         }
>
> -       realfile_out = ovl_real_file(file_out);
> -       ret = PTR_ERR(realfile_out);
> -       if (IS_ERR(realfile_out))
> -               goto out_unlock;
> +       if (out_overlay) {
> +               inode_lock(inode_out);
>
> -       realfile_in = ovl_real_file(file_in);
> -       ret = PTR_ERR(realfile_in);
> -       if (IS_ERR(realfile_in))
> -               goto out_unlock;
> +               if (op != OVL_DEDUPE) {
> +                       /* Update mode */
> +                       ovl_copyattr(inode_out);
> +                       ret = file_remove_privs(file_out);
> +                       if (ret)
> +                               goto out_unlock;
> +               }
> +
> +               realfile_out = ovl_real_file(file_out);
> +               ret = PTR_ERR(realfile_out);
> +               if (IS_ERR(realfile_out))
> +                       goto out_unlock;
> +
> +               old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
> +       }
>
> -       old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
>         switch (op) {
>         case OVL_COPY:
>                 ret = vfs_copy_file_range(realfile_in, pos_in,
> @@ -588,13 +603,16 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
>                                                 flags);
>                 break;
>         }
> -       ovl_revert_creds(old_cred);
>
>         /* Update size */
> -       ovl_file_modified(file_out);
> +       if (out_overlay) {
> +               ovl_file_modified(file_out);
> +               ovl_revert_creds(old_cred);
> +       }
>
>  out_unlock:
> -       inode_unlock(inode_out);
> +       if (out_overlay)
> +               inode_unlock(inode_out);
>
>         return ret;
>  }
> @@ -654,6 +672,8 @@ static int ovl_flush(struct file *file, fl_owner_t id)
>  }
>
>  const struct file_operations ovl_file_operations = {
> +       .fop_flags      = FOP_CROSS_TYPE_COPYFILE,
> +

I suggest instead FOP_CROSS_SB_COPYFILE and set it
also for nfs4_file_operations (CONFIG_NFS_V4_2) and for
all the cifs_file_ops flavors.

cross-sb-copy is a private case of cross-fstype-copy

nfs4_copy_file_range() already verifies that the source is
from the same fstype with wrong comment that should be fixed:

@@ -143,7 +143,7 @@ static ssize_t __nfs4_copy_file_range(struct file
*file_in, loff_t pos_in,
        ssize_t ret;
        bool sync = false;

-       /* Only offload copy if superblock is the same */
+       /* Only offload copy if source is also nfs42 */
        if (file_in->f_op != &nfs4_file_operations)
                return -EXDEV;

For cifs, you'd need to add an explicit check:

@@ -1382,6 +1382,10 @@ ssize_t cifs_file_copychunk_range(unsigned int xid,

        cifs_dbg(FYI, "copychunk range\n");

+       /* Only offload copy if source is also cifs */
+       if (src_file->f_op->copy_file_range != cifs_copy_file_range)
+               return -EXDEV;
+

and then...

>         .open           = ovl_open,
>         .release        = ovl_release,
>         .llseek         = ovl_llseek,
> diff --git a/fs/read_write.c b/fs/read_write.c
> index a6133241dfb8..93618441a02d 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1489,7 +1489,7 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
>          * We allow some filesystems to handle cross sb copy, but passing
>          * a file of the wrong filesystem type to filesystem driver can result
>          * in an attempt to dereference the wrong type of ->private_data, so
> -        * avoid doing that until we really have a good reason.
> +        * avoid doing unless FOP_CROSS_TYPE_COPYFILE is set.
>          *
>          * nfs and cifs define several different file_system_type structures
>          * and several different sets of file_operations, but they all end up
> @@ -1497,6 +1497,9 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
>          */
>         if (flags & COPY_FILE_SPLICE) {
>                 /* cross sb splice is allowed */
> +       } else if (file_in->f_op->fop_flags & FOP_CROSS_TYPE_COPYFILE ||
> +                  file_out->f_op->fop_flags & FOP_CROSS_TYPE_COPYFILE) {
> +               /* file system understands how to cross FS types */
>         } else if (file_out->f_op->copy_file_range) {
>                 if (file_in->f_op->copy_file_range !=
>                     file_out->f_op->copy_file_range)

This becomes something like:

@@ -1477,6 +1477,8 @@ static int generic_copy_file_checks(struct file
*file_in, loff_t pos_in,
 {
        struct inode *inode_in = file_inode(file_in);
        struct inode *inode_out = file_inode(file_out);
+       bool samesb = file_inode(file_in)->i_sb == file_inode(file_out)->i_sb;
+       bool allow_cross_sb = file_out->f_op->fop_flags & FOP_CROSS_SB_COPYFILE;
        uint64_t count = *req_count;
        loff_t size_in;
        int ret;
@@ -1489,19 +1491,14 @@ static int generic_copy_file_checks(struct
file *file_in, loff_t pos_in,
         * We allow some filesystems to handle cross sb copy, but passing
         * a file of the wrong filesystem type to filesystem driver can result
         * in an attempt to dereference the wrong type of ->private_data, so
-        * avoid doing that until we really have a good reason.
-        *
-        * nfs and cifs define several different file_system_type structures
-        * and several different sets of file_operations, but they all end up
-        * using the same ->copy_file_range() function pointer.
+        * avoid doing that unless a filesystem declares cross sb copy support.
         */
        if (flags & COPY_FILE_SPLICE) {
                /* cross sb splice is allowed */
        } else if (file_out->f_op->copy_file_range) {
-               if (file_in->f_op->copy_file_range !=
-                   file_out->f_op->copy_file_range)
+               if (!samesb && !allow_cross_sb)
                        return -EXDEV;
-       } else if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb) {
+       } else if (!samesb) {
                return -EXDEV;
        }

and stops relying on the heuristic
"cifs define several different file_system_type structures
 and several different sets of file_operations, but they all end up
 using the same ->copy_file_range() function pointer"

> @@ -1576,10 +1579,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>          * same sb using clone, but for filesystems where both clone and copy
>          * are supported (e.g. nfs,cifs), we only call the copy method.
>          */
> -       if (!splice && file_out->f_op->copy_file_range) {
> -               ret = file_out->f_op->copy_file_range(file_in, pos_in,
> -                                                     file_out, pos_out,
> -                                                     len, flags);
> +       if (!splice && (file_in->f_op->copy_file_range || file_out->f_op->copy_file_range)) {
> +               ret =  (file_in->f_op->copy_file_range ?
> +                       file_in->f_op->copy_file_range :
> +                       file_out->f_op->copy_file_range)(file_in, pos_in,
> +                                                        file_out, pos_out,
> +                                                        len, flags);

I prefer not to do that unless there is a *very* good reason.
copy from another sb source is less risky IMO.

Thanks,
Amir.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux