Re: [PATCH 0/2] Avert possible deadlock with splice() and fanotify

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 30, 2023 at 3:37 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Thu 30-11-23 12:07:23, Amir Goldstein wrote:
> > On Thu, Nov 30, 2023 at 10:32 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > On Wed, Nov 29, 2023 at 10:07 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > Christian,
> > > >
> > > > Josef has helped me see the light and figure out how to avoid the
> > > > possible deadlock, which involves:
> > > > - splice() from source file in a loop mounted fs to dest file in
> > > >   a host fs, where the loop image file is
> > > > - fsfreeze on host fs
> > > > - write to host fs in context of fanotify permission event handler
> > > >   (FAN_ACCESS_PERM) on the splice source file
> > > >
> > > > The first patch should not be changing any logic.
> > > > I only build tested the ceph patch, so hoping to get an
> > > > Acked-by/Tested-by from Jeff.
> > > >
> > > > The second patch rids us of the deadlock by not holding
> > > > file_start_write() while reading from splice source file.
> > > >
> > >
> > > OOPS, I missed another corner case:
> > > The COPY_FILE_SPLICE fallback of server-side-copy in nfsd/ksmbd
> > > needs to use the start-write-safe variant of do_splice_direct(),
> > > because in this case src and dst can be on any two fs.
> > > Expect an extra patch in v2.
> > >
> >
> > For the interested, see server-side-copy patch below.
> > Pushed to branch start-write-safe [1], but will wait with v2 until
> > I get comments on v1.
> >
> > Thanks,
> > Amir.
> >
> > [1] https://github.com/amir73il/linux/commits/start-write-safe
> >
> > Author: Amir Goldstein <amir73il@xxxxxxxxx>
> > Date:   Thu Nov 30 11:42:50 2023 +0200
> >
> >     fs: use do_splice_direct() for nfsd/ksmbd server-side-copy
> >
> >     nfsd/ksmbd call vfs_copy_file_range() with flag COPY_FILE_SPLICE to
> >     perform kernel copy between two files on any two filesystems.
> >
> >     Splicing input file, while holding file_start_write() on the output file
> >     which is on a different sb, posses a risk for fanotify related deadlocks.
> >
> >     We only need to call splice_file_range() from within the context of
> >     ->copy_file_range() filesystem methods with file_start_write() held.
> >
> >     To avoid the possible deadlocks, always use do_splice_direct() instead of
> >     splice_file_range() for the kernel copy fallback in vfs_copy_file_range()
> >     without holding file_start_write().
> >
> >     Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 0bc99f38e623..12583e32aa6d 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1565,11 +1565,18 @@ ssize_t vfs_copy_file_range(struct file
> > *file_in, loff_t pos_in,
> >          * and which filesystems do not, that will allow userspace tools to
> >          * make consistent desicions w.r.t using copy_file_range().
> >          *
> > -        * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE.
> > +        * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE
> > +        * for server-side-copy between any two sb.
> > +        *
> > +        * In any case, we call do_splice_direct() and not splice_file_range(),
> > +        * without file_start_write() held, to avoid possible deadlocks related
> > +        * to splicing from input file, while file_start_write() is held on
> > +        * the output file on a different sb.
> >          */
> > -       ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> > -                                     flags);
> > +       file_end_write(file_out);
> >
> > +       ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> > +                              min_t(size_t, len, MAX_RW_COUNT), 0);
> >  done:
> >         if (ret > 0) {
> >                 fsnotify_access(file_in);
> > @@ -1581,8 +1588,6 @@ ssize_t vfs_copy_file_range(struct file
> > *file_in, loff_t pos_in,
> >         inc_syscr(current);
> >         inc_syscw(current);
> >
> > -       file_end_write(file_out);
> > -
>
> This file_end_write() is also used by the paths using ->copy_file_range()
> and ->remap_file_range() so you need to balance those...

You're right! already found that bug and fixed in my branch:

diff --git a/fs/read_write.c b/fs/read_write.c
index 0bc99f38e623..e0c2c1b5962b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1421,6 +1421,10 @@ ssize_t generic_copy_file_range(struct file
*file_in, loff_t pos_in,
                                struct file *file_out, loff_t pos_out,
                                size_t len, unsigned int flags)
 {
+       /* May only be called from within ->copy_file_range() methods */
+       if (WARN_ON_ONCE(flags))
+               return -EINVAL;
+
        return splice_file_range(file_in, &pos_in, file_out, &pos_out,
                                 min_t(size_t, len, MAX_RW_COUNT));
 }
@@ -1541,19 +1545,22 @@ ssize_t vfs_copy_file_range(struct file
*file_in, loff_t pos_in,
                ret = file_out->f_op->copy_file_range(file_in, pos_in,
                                                      file_out, pos_out,
                                                      len, flags);
-               goto done;
-       }
-
-       if (!splice && file_in->f_op->remap_file_range &&
-           file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
+       } else if (!splice && file_in->f_op->remap_file_range &&
+                  file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) {
                ret = file_in->f_op->remap_file_range(file_in, pos_in,
                                file_out, pos_out,
                                min_t(loff_t, MAX_RW_COUNT, len),
                                REMAP_FILE_CAN_SHORTEN);
-               if (ret > 0)
-                       goto done;
+               /* fallback to splice */
+               if (ret <= 0)
+                       splice = true;
        }

+       file_end_write(file_out);
+
+       if (!splice)
+               goto done;
+
        /*
         * We can get here for same sb copy of filesystems that do not implement
         * ->copy_file_range() in case filesystem does not support clone or in
@@ -1565,11 +1572,16 @@ ssize_t vfs_copy_file_range(struct file
*file_in, loff_t pos_in,
         * and which filesystems do not, that will allow userspace tools to
         * make consistent desicions w.r.t using copy_file_range().
         *
-        * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE.
+        * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE
+        * for server-side-copy between any two sb.
+        *
+        * In any case, we call do_splice_direct() and not splice_file_range(),
+        * without file_start_write() held, to avoid possible deadlocks related
+        * to splicing from input file, while file_start_write() is held on
+        * the output file on a different sb.
         */
-       ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
-                                     flags);
-
+       ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
+                              min_t(size_t, len, MAX_RW_COUNT), 0);
 done:
        if (ret > 0) {
                fsnotify_access(file_in);
@@ -1581,8 +1593,6 @@ ssize_t vfs_copy_file_range(struct file
*file_in, loff_t pos_in,
        inc_syscr(current);
        inc_syscw(current);

-       file_end_write(file_out);
-
        return ret;
 }
 EXPORT_SYMBOL(vfs_copy_file_range);





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux