On Thu 30-11-23 12:07:23, Amir Goldstein wrote: > On Thu, Nov 30, 2023 at 10:32 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Wed, Nov 29, 2023 at 10:07 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > Christian, > > > > > > Josef has helped me see the light and figure out how to avoid the > > > possible deadlock, which involves: > > > - splice() from source file in a loop mounted fs to dest file in > > > a host fs, where the loop image file is > > > - fsfreeze on host fs > > > - write to host fs in context of fanotify permission event handler > > > (FAN_ACCESS_PERM) on the splice source file > > > > > > The first patch should not be changing any logic. > > > I only build tested the ceph patch, so hoping to get an > > > Acked-by/Tested-by from Jeff. > > > > > > The second patch rids us of the deadlock by not holding > > > file_start_write() while reading from splice source file. > > > > > > > OOPS, I missed another corner case: > > The COPY_FILE_SPLICE fallback of server-side-copy in nfsd/ksmbd > > needs to use the start-write-safe variant of do_splice_direct(), > > because in this case src and dst can be on any two fs. > > Expect an extra patch in v2. > > > > For the interested, see server-side-copy patch below. > Pushed to branch start-write-safe [1], but will wait with v2 until > I get comments on v1. > > Thanks, > Amir. > > [1] https://github.com/amir73il/linux/commits/start-write-safe > > Author: Amir Goldstein <amir73il@xxxxxxxxx> > Date: Thu Nov 30 11:42:50 2023 +0200 > > fs: use do_splice_direct() for nfsd/ksmbd server-side-copy > > nfsd/ksmbd call vfs_copy_file_range() with flag COPY_FILE_SPLICE to > perform kernel copy between two files on any two filesystems. > > Splicing input file, while holding file_start_write() on the output file > which is on a different sb, posses a risk for fanotify related deadlocks. > > We only need to call splice_file_range() from within the context of > ->copy_file_range() filesystem methods with file_start_write() held. > > To avoid the possible deadlocks, always use do_splice_direct() instead of > splice_file_range() for the kernel copy fallback in vfs_copy_file_range() > without holding file_start_write(). > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > diff --git a/fs/read_write.c b/fs/read_write.c > index 0bc99f38e623..12583e32aa6d 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1565,11 +1565,18 @@ ssize_t vfs_copy_file_range(struct file > *file_in, loff_t pos_in, > * and which filesystems do not, that will allow userspace tools to > * make consistent desicions w.r.t using copy_file_range(). > * > - * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE. > + * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE > + * for server-side-copy between any two sb. > + * > + * In any case, we call do_splice_direct() and not splice_file_range(), > + * without file_start_write() held, to avoid possible deadlocks related > + * to splicing from input file, while file_start_write() is held on > + * the output file on a different sb. > */ > - ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len, > - flags); > + file_end_write(file_out); > > + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, > + min_t(size_t, len, MAX_RW_COUNT), 0); > done: > if (ret > 0) { > fsnotify_access(file_in); > @@ -1581,8 +1588,6 @@ ssize_t vfs_copy_file_range(struct file > *file_in, loff_t pos_in, > inc_syscr(current); > inc_syscw(current); > > - file_end_write(file_out); > - This file_end_write() is also used by the paths using ->copy_file_range() and ->remap_file_range() so you need to balance those... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR