On Tue, Feb 16, 2021 at 8:54 PM Luis Henriques <lhenriques@xxxxxxx> wrote: > > Amir Goldstein <amir73il@xxxxxxxxx> writes: > > > On Tue, Feb 16, 2021 at 6:41 PM Luis Henriques <lhenriques@xxxxxxx> wrote: > >> > >> Amir Goldstein <amir73il@xxxxxxxxx> writes: > >> > >> >> Ugh. And I guess overlayfs may have a similar problem. > >> > > >> > Not exactly. > >> > Generally speaking, overlayfs should call vfs_copy_file_range() > >> > with the flags it got from layer above, so if called from nfsd it > >> > will allow cross fs copy and when called from syscall it won't. > >> > > >> > There are some corner cases where overlayfs could benefit from > >> > COPY_FILE_SPLICE (e.g. copy from lower file to upper file), but > >> > let's leave those for now. Just leave overlayfs code as is. > >> > >> Got it, thanks for clarifying. > >> > >> >> > This is easy to solve with a flag COPY_FILE_SPLICE (or something) that > >> >> > is internal to kernel users. > >> >> > > >> >> > FWIW, you may want to look at the loop in ovl_copy_up_data() > >> >> > for improvements to nfsd_copy_file_range(). > >> >> > > >> >> > We can move the check out to copy_file_range syscall: > >> >> > > >> >> > if (flags != 0) > >> >> > return -EINVAL; > >> >> > > >> >> > Leave the fallback from all filesystems and check for the > >> >> > COPY_FILE_SPLICE flag inside generic_copy_file_range(). > >> >> > >> >> Ok, the diff bellow is just to make sure I understood your suggestion. > >> >> > >> >> The patch will also need to: > >> >> > >> >> - change nfs and overlayfs calls to vfs_copy_file_range() so that they > >> >> use the new flag. > >> >> > >> >> - check flags in generic_copy_file_checks() to make sure only valid flags > >> >> are used (COPY_FILE_SPLICE at the moment). > >> >> > >> >> Also, where should this flag be defined? include/uapi/linux/fs.h? > >> > > >> > Grep for REMAP_FILE_ > >> > Same header file, same Documentation rst file. > >> > > >> >> > >> >> Cheers, > >> >> -- > >> >> Luis > >> >> > >> >> diff --git a/fs/read_write.c b/fs/read_write.c > >> >> index 75f764b43418..341d315d2a96 100644 > >> >> --- a/fs/read_write.c > >> >> +++ b/fs/read_write.c > >> >> @@ -1383,6 +1383,13 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, > >> >> struct file *file_out, loff_t pos_out, > >> >> size_t len, unsigned int flags) > >> >> { > >> >> + if (!(flags & COPY_FILE_SPLICE)) { > >> >> + if (!file_out->f_op->copy_file_range) > >> >> + return -EOPNOTSUPP; > >> >> + else if (file_out->f_op->copy_file_range != > >> >> + file_in->f_op->copy_file_range) > >> >> + return -EXDEV; > >> >> + } > >> > > >> > That looks strange, because you are duplicating the logic in > >> > do_copy_file_range(). Maybe better: > >> > > >> > if (WARN_ON_ONCE(flags & ~COPY_FILE_SPLICE)) > >> > return -EINVAL; > >> > if (flags & COPY_FILE_SPLICE) > >> > return do_splice_direct(file_in, &pos_in, file_out, &pos_out, > >> > len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); > >> > >> My initial reasoning for duplicating the logic in do_copy_file_range() was > >> to allow the generic_copy_file_range() callers to be left unmodified and > >> allow the filesystems to default to this implementation. > >> > >> With this change, I guess that the calls to generic_copy_file_range() from > >> the different filesystems can be dropped, as in my initial patch, as they > >> will always get -EINVAL. The other option would be to set the > >> COPY_FILE_SPLICE flag in those calls, but that would get us back to the > >> problem we're trying to solve. > > > > I don't understand the problem. > > > > What exactly is wrong with the code I suggested? > > Why should any filesystem be changed? > > > > Maybe I am missing something. > > Ok, I have to do a full brain reboot and start all over. > > Before that, I picked the code you suggested and tested it. I've mounted > a cephfs filesystem and used xfs_io to execute a 'copy_range' command > using /sys/kernel/debug/sched_features as source. The result was a > 0-sized file in cephfs. And the reason is thevfs_copy_file_range() > early exit in: > > if (len == 0) > return 0; > > 'len' is set in generic_copy_file_checks(). Good point.. I guess we will need to do all the checks earlier in generic_copy_file_checks() including the logic of: if (file_in->f_op->remap_file_range && file_inode(file_in)->i_sb == file_inode(file_out)->i_sb) > > This means that we're not solving the original problem anymore (probably > since v1 of this patch, haven't checked). > > Also, re-reading Trond's emails, I read: "... also disallowing the copy > from, say, an XFS formatted partition to an ext4 partition". Isn't that > *exactly* what we're trying to do here? I.e. _prevent_ these copies from > happening so that tracefs files can't be CFR'ed? > We want to address the report which means calls coming from copy_file_range() syscall. Trond's use case is vfs_copy_file_range() coming from nfsd. When he writes about copy from XFS to ext4, he means an NFS client is issuing server side copy (on same or different NFS mounts) and the NFS server is executing nfsd_copy_file_range() on a source file that happens to be on XFS and destination happens to be on ext4. We can undo the copy_file_range() syscall change of behavior from v5.3 without regressing the NFS use case. We just need to be careful and look at all the affected code paths. Thanks, Amir.