On Wed, Oct 31, 2018 at 10:51:48AM -0400, Olga Kornievskaia wrote: > On Tue, Oct 30, 2018 at 8:15 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Tue, Oct 30, 2018 at 05:10:58PM -0400, Olga Kornievskaia wrote: > > > On Tue, Oct 30, 2018 at 5:03 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > > > On Mon, Oct 29, 2018 at 10:41:22AM -0400, Olga Kornievskaia wrote: > > > > > On Sat, Oct 27, 2018 at 5:27 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > > On Fri, Oct 26, 2018 at 04:10:48PM -0400, Olga Kornievskaia wrote: > > > > > > > From: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > > > > > > > > > > > Input source offset can't be beyond the end of the file. > > > > > > > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > > > > --- > > > > > > > fs/read_write.c | 3 +++ > > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > > > > > index fb4ffca..b3b304e 100644 > > > > > > > --- a/fs/read_write.c > > > > > > > +++ b/fs/read_write.c > > > > > > > @@ -1594,6 +1594,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > + if (pos_in >= i_size_read(inode_in)) > > > > > > > + return -EINVAL; > > > > > > > + > > > > > > > > > > > > vfs_copy_file_range seems ot be missing a wide range of checks. > > > > > > rlimit, s_maxbytes, LFS file sizes, etc. This is a write, so all the > > > > > > checks in generic_write_checks() apply, right? And the same security > > > > > > issues like stripping setuid bits, etc? And we need to touch > > > > > > atime on the source file, too? > > > > > > > > > > Yes sound like needed checks. > > > > > > > > > > > We've just merged 5 or so patches in 4.19-rc8 and we're ready to > > > > > > merge another ~30 patch series to fix all the stuff missing from the > > > > > > clone/dedupe file range operations that make them safe and robust. > > > > > > It seems like copy_file_range is all the checks it needs, too? > > > > > > > > > > Are you proposing to not do this check now in favor of the proper work > > > > > that will do all of those checks you listed above? > > > > > > > > No, I'm saying that if you're adding one check, there's a whole heap > > > > of checks that still need to be added, *especially* if this is going > > > > to fall back to page cache copy between superblocks that may have > > > > different limits and constraints. > > > > > > > > There's security issues in this API. They need to be fixed before we > > > > allow it to do more and potentially expose more problems due to it's > > > > wider capability. > > > > > > Before I totally give up on this feature, can you help me understand > > > your concerns with allowing the generic copy_file_range via > > > do_splice(). > > > > it's not do_splice_direct() i'm concerned about. It's /writing data > > without adequate checks/ that I'm concerned about. > > ->copy_file_range() also writes data, so it needs to undergo the > > same safety checks as well. > > Thank you Dave for clarifying and elaborating on the points. As you > pointed out this concerns apply to the current code the same way as to > the patch series. Those concerns should be address however I feel like > they shouldn't be the responsibility of this particular patch series. > Therefore, I ask for the community to either make any final comments > for any changes that are needed to "version 7" patches and if no more > comments arise I would like to ask for this to be added to the queue > for the next kernel version. > > Then the next patch series would be just VFS and would add appropriate > checks and then allow for the generic copy_file_range() via do_splice. That's fine by me. > > > > I have mentioned I'm not a VFS expert thus I come from just looking at > > > the available documentation and the code. > > > > > > I don't see any restrictions on the files being passed in the > > > do_splice_direct(). There are no restrictions that they must be from > > > the same filesystem or file system type. But perhaps this not the > > > concern you had but more about checking validity of arguments? > > > > > > I have looked at Dave Wong's, if I'm not mistaken these 2 are the > > > relevant patches: > > > [PATCH 02/28] vfs: check file ranges before cloning files > > > -- a couple but not all checks apply to copy_file_range() . > > > > Yes, of course - clone/dedupe have different constraints, but the > > core checks are still needed for copy_file_range(). > > > > For example, the man page says: > > > > EINVAL > > Requested range extends beyond the end of the source > > file; or the flags argument is not 0. > > > > Your patch above doesn't actually check that - it only checks if the > > pos_in is beyond EOF. It needs to check if pos_in + len is beyond > > EOF. After checking for wraps, of course. > > There was a reason why I didn't include the "pos_in + len" check. It > sparked the conversation why should "pos_in + len" be an error, when a > "read" system call would just return a "short" read and EOF. So I > dropped the check for "pst_in + len" to be an error. So man page patches will be required, too. :) Basically, we need to nail down the expected semantics, make sure they are correctly documented and /enforced consistently/ across all filesystems. > > > -- these checks apply to the code once we fall back to the > > > do_splice(). > > > > man page says: > > > > EFBIG > > An attempt was made to write a file that exceeds the > > implementation-defined maximum file size or the process's > > file size limit, or to write at a position past the maximum > > allowed offset. > > > > These conditions apply to the destination file regards of the method > > used to copy the data. That's what the generic methods now check for > > clone/dedupe, and need to be used here, too. > > Agreed and once Darrek patches are in, copy_file_range() can use them too. Should be in the next couple of days. > > 7debbf015f58 xfs: update ctime and remove suid before cloning files > > > > Which then got moved into the generic remap_file_range code in > > Darrick's "vfs: remap helper should update destination inode > > metadata" patch: > > > > https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&id=8dde90bca6fca3736ea20109654bcf6dcf2ecf1d > > > > We can't assume that a server side copy is going to strip setuid > > bits or even update target files c/mtimes. > > I would like to discuss your concerns about updating attributes > (c/m/atimes), why shouldn't it be a ->copy_file_range() > responsibility. copy_file_rage is basically a read+write. As far as I > can tell, vfs_read and vfs_write (in VFS) don't deal with updating > attributes. You're looking at the wrong level. The VFS layer is the first multiplexing layer, allowing filesystems to select a method of handling functionality. They then make use of "generic helpers" to implement the required functionality, and they contain the required updates. ie.g. A list of generic helpers with atime update callers from my cscope index: f fs/pipe.c pipe_read 343 file_accessed(filp); h fs/readdir.c iterate_dir 56 file_accessed(file); i fs/splice.c generic_file_splice_read 311 file_accessed(in); j fs/splice.c splice_direct_to_actor 992 file_accessed(in); p mm/filemap.c generic_file_buffered_read 2299 file_accessed(filp); q mm/filemap.c generic_file_read_iter 2339 file_accessed(file); r mm/filemap.c generic_file_mmap 2736 file_accessed(file); These are effectively reference implementations of the file reading infrastructure. Filesystems often have customised implementations but they all must contain the same functioanlity and behaviour as the reference implementation. > I'm guessing it's assumed that underlying file systems are > going to take care of it (unless of course I misread the code). Only the ones that don't specifically call the generic helper to do the work. IOWs, what I'd like to see is a generic_copy_file_range() as the reference implemenation using a page cache copy. This contains all the required checks, timestamp updates, etc. If the filesystem does not supply ->copy_file_range, then generic_copy_file_range() is called, not do_splice_direct(). Indeed, a filesystem should be able to do: .copy_file_range = xfs_copy_file_range, xfs_copy_file_range(...) { trace_xfs_copy_file_range(...) return generic_copy_file_range(....); } and have everything work correctly. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx