On Thu, Nov 30, 2023 at 10:32 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Wed, Nov 29, 2023 at 10:07 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > Christian, > > > > Josef has helped me see the light and figure out how to avoid the > > possible deadlock, which involves: > > - splice() from source file in a loop mounted fs to dest file in > > a host fs, where the loop image file is > > - fsfreeze on host fs > > - write to host fs in context of fanotify permission event handler > > (FAN_ACCESS_PERM) on the splice source file > > > > The first patch should not be changing any logic. > > I only build tested the ceph patch, so hoping to get an > > Acked-by/Tested-by from Jeff. > > > > The second patch rids us of the deadlock by not holding > > file_start_write() while reading from splice source file. > > > > OOPS, I missed another corner case: > The COPY_FILE_SPLICE fallback of server-side-copy in nfsd/ksmbd > needs to use the start-write-safe variant of do_splice_direct(), > because in this case src and dst can be on any two fs. > Expect an extra patch in v2. > For the interested, see server-side-copy patch below. Pushed to branch start-write-safe [1], but will wait with v2 until I get comments on v1. Thanks, Amir. [1] https://github.com/amir73il/linux/commits/start-write-safe Author: Amir Goldstein <amir73il@xxxxxxxxx> Date: Thu Nov 30 11:42:50 2023 +0200 fs: use do_splice_direct() for nfsd/ksmbd server-side-copy nfsd/ksmbd call vfs_copy_file_range() with flag COPY_FILE_SPLICE to perform kernel copy between two files on any two filesystems. Splicing input file, while holding file_start_write() on the output file which is on a different sb, posses a risk for fanotify related deadlocks. We only need to call splice_file_range() from within the context of ->copy_file_range() filesystem methods with file_start_write() held. To avoid the possible deadlocks, always use do_splice_direct() instead of splice_file_range() for the kernel copy fallback in vfs_copy_file_range() without holding file_start_write(). Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> diff --git a/fs/read_write.c b/fs/read_write.c index 0bc99f38e623..12583e32aa6d 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1565,11 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, * and which filesystems do not, that will allow userspace tools to * make consistent desicions w.r.t using copy_file_range(). * - * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE. + * We also get here if caller (e.g. nfsd) requested COPY_FILE_SPLICE + * for server-side-copy between any two sb. + * + * In any case, we call do_splice_direct() and not splice_file_range(), + * without file_start_write() held, to avoid possible deadlocks related + * to splicing from input file, while file_start_write() is held on + * the output file on a different sb. */ - ret = generic_copy_file_range(file_in, pos_in, file_out, pos_out, len, - flags); + file_end_write(file_out); + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, + min_t(size_t, len, MAX_RW_COUNT), 0); done: if (ret > 0) { fsnotify_access(file_in); @@ -1581,8 +1588,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, inc_syscr(current); inc_syscw(current); - file_end_write(file_out); - return ret; } EXPORT_SYMBOL(vfs_copy_file_range);