On Thu 30-11-23 16:16:24, Amir Goldstein wrote: > nfsd/ksmbd call vfs_copy_file_range() with flag COPY_FILE_SPLICE to > perform kernel copy between two files on any two filesystems. > > Splicing input file, while holding file_start_write() on the output file > which is on a different sb, posses a risk for fanotify related deadlocks. > > We only need to call splice_file_range() from within the context of > ->copy_file_range() filesystem methods with file_start_write() held. > > To avoid the possible deadlocks, always use do_splice_direct() instead of > splice_file_range() for the kernel copy fallback in vfs_copy_file_range() > without holding file_start_write(). > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> Looks good to me. Feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> Honza > --- > fs/read_write.c | 36 +++++++++++++++++++++++------------- > 1 file changed, 23 insertions(+), 13 deletions(-) > > diff --git a/fs/read_write.c b/fs/read_write.c > index 0bc99f38e623..e0c2c1b5962b 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1421,6 +1421,10 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, > struct file *file_out, loff_t pos_out, > size_t len, unsigned int flags) > { > + /* May only be called from within ->copy_file_range() methods */ > + if (WARN_ON_ONCE(flags)) > + return -EINVAL; > + > return splice_file_range(file_in, &pos_in, file_out, &pos_out, > min_t(size_t, len, MAX_RW_COUNT)); > } > @@ -1541,19 +1545,22 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > ret = file_out->f_op->copy_file_range(file_in, pos_in, > file_out, pos_out, > len, flags); > - goto done; > - } > - > - if (!splice && file_in->f_op->remap_file_range && > - file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) { > + } else if (!splice && file_in->f_op->remap_file_range && > + file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) { > ret = file_in->f_op->remap_file_range(file_in, pos_in, > file_out, pos_out, > min_t(loff_t, MAX_RW_COUNT, len), > REMAP_FILE_CAN_SHORTEN); > - if (ret > 0) > - goto done; > + /* fallback to splice */ > + if (ret <= 0) > + splice = true; > } > > + file_end_write(file_out); > + > + if (!splice) > + goto done; > + > /* > * We can get here for same sb copy of filesystems that do not implement > * ->copy_file_range() in case filesystem does not support clone or in > @@ -1565,11 +1572,16 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > * and which filesystems do not, that will allow userspace tools to > * make consistent desicions w.r.t using copy_file_range(). > * > - * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE. > + * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE > + * for server-side-copy between any two sb. > + * > + * In any case, we call do_splice_direct() and not splice_file_range(), > + * without file_start_write() held, to avoid possible deadlocks related > + * to splicing from input file, while file_start_write() is held on > + * the output file on a different sb. > */ > - ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len, > - flags); > - > + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, > + min_t(size_t, len, MAX_RW_COUNT), 0); > done: > if (ret > 0) { > fsnotify_access(file_in); > @@ -1581,8 +1593,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > inc_syscr(current); > inc_syscw(current); > > - file_end_write(file_out); > - > return ret; > } > EXPORT_SYMBOL(vfs_copy_file_range); > -- > 2.34.1 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR