Re: [PATCH 0/2] Avert possible deadlock with splice() and fanotify

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 30, 2023 at 10:32 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Wed, Nov 29, 2023 at 10:07 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > Christian,
> >
> > Josef has helped me see the light and figure out how to avoid the
> > possible deadlock, which involves:
> > - splice() from source file in a loop mounted fs to dest file in
> >   a host fs, where the loop image file is
> > - fsfreeze on host fs
> > - write to host fs in context of fanotify permission event handler
> >   (FAN_ACCESS_PERM) on the splice source file
> >
> > The first patch should not be changing any logic.
> > I only build tested the ceph patch, so hoping to get an
> > Acked-by/Tested-by from Jeff.
> >
> > The second patch rids us of the deadlock by not holding
> > file_start_write() while reading from splice source file.
> >
>
> OOPS, I missed another corner case:
> The COPY_FILE_SPLICE fallback of server-side-copy in nfsd/ksmbd
> needs to use the start-write-safe variant of do_splice_direct(),
> because in this case src and dst can be on any two fs.
> Expect an extra patch in v2.
>

For the interested, see server-side-copy patch below.
Pushed to branch start-write-safe [1], but will wait with v2 until
I get comments on v1.

Thanks,
Amir.

[1] https://github.com/amir73il/linux/commits/start-write-safe

Author: Amir Goldstein <amir73il@xxxxxxxxx>
Date:   Thu Nov 30 11:42:50 2023 +0200

    fs: use do_splice_direct() for nfsd/ksmbd server-side-copy

    nfsd/ksmbd call vfs_copy_file_range() with flag COPY_FILE_SPLICE to
    perform kernel copy between two files on any two filesystems.

    Splicing input file, while holding file_start_write() on the output file
    which is on a different sb, posses a risk for fanotify related deadlocks.

    We only need to call splice_file_range() from within the context of
    ->copy_file_range() filesystem methods with file_start_write() held.

    To avoid the possible deadlocks, always use do_splice_direct() instead of
    splice_file_range() for the kernel copy fallback in vfs_copy_file_range()
    without holding file_start_write().

    Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>

diff --git a/fs/read_write.c b/fs/read_write.c
index 0bc99f38e623..12583e32aa6d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1565,11 +1565,18 @@ ssize_t vfs_copy_file_range(struct file
*file_in, loff_t pos_in,
         * and which filesystems do not, that will allow userspace tools to
         * make consistent desicions w.r.t using copy_file_range().
         *
-        * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE.
+        * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE
+        * for server-side-copy between any two sb.
+        *
+        * In any case, we call do_splice_direct() and not splice_file_range(),
+        * without file_start_write() held, to avoid possible deadlocks related
+        * to splicing from input file, while file_start_write() is held on
+        * the output file on a different sb.
         */
-       ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
-                                     flags);
+       file_end_write(file_out);

+       ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
+                              min_t(size_t, len, MAX_RW_COUNT), 0);
 done:
        if (ret > 0) {
                fsnotify_access(file_in);
@@ -1581,8 +1588,6 @@ ssize_t vfs_copy_file_range(struct file
*file_in, loff_t pos_in,
        inc_syscr(current);
        inc_syscw(current);

-       file_end_write(file_out);
-
        return ret;
 }
 EXPORT_SYMBOL(vfs_copy_file_range);





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux