On Wed, Oct 10, 2018 at 3:14 AM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > Create a RFR_TO_SRC_EOF flag to explicitly declare that the caller wants > the remap implementation to remap to the end of the source file, once > the files are locked. > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > fs/ioctl.c | 3 ++- > fs/nfsd/vfs.c | 3 ++- > fs/read_write.c | 13 +++++++------ > include/linux/fs.h | 2 ++ > 4 files changed, 13 insertions(+), 8 deletions(-) > > > diff --git a/fs/ioctl.c b/fs/ioctl.c > index 505275ec5596..7fec997abd2f 100644 > --- a/fs/ioctl.c > +++ b/fs/ioctl.c > @@ -224,6 +224,7 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd, > { > struct fd src_file = fdget(srcfd); > loff_t cloned; > + unsigned int flags = olen == 0 ? RFR_TO_SRC_EOF : 0; > int ret; > > if (!src_file.file) > @@ -232,7 +233,7 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd, > if (src_file.file->f_path.mnt != dst_file->f_path.mnt) > goto fdput; > cloned = vfs_clone_file_range(src_file.file, off, dst_file, destoff, > - olen, 0); > + olen, flags); > if (cloned < 0) > ret = cloned; > else if (olen && cloned != olen) > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 726fc5b2b27a..d1f2ae08adf6 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -542,8 +542,9 @@ __be32 nfsd4_clone_file_range(struct file *src, u64 src_pos, struct file *dst, > u64 dst_pos, u64 count) > { > loff_t cloned; > + unsigned int flags = count == 0 ? RFR_TO_SRC_EOF : 0; > > - cloned = vfs_clone_file_range(src, src_pos, dst, dst_pos, count, 0); > + cloned = vfs_clone_file_range(src, src_pos, dst, dst_pos, count, flags); > if (count && cloned != count) > cloned = -EINVAL; > return nfserrno(cloned < 0 ? cloned : 0); > diff --git a/fs/read_write.c b/fs/read_write.c > index 479eb810c8e6..a628fd9a47cf 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1748,11 +1748,12 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, > > isize = i_size_read(inode_in); > > - /* Zero length dedupe exits immediately; reflink goes to EOF. */ > - if (*len == 0) { > - if (is_dedupe || pos_in == isize) > - return 0; > - if (pos_in > isize) > + /* > + * If the caller asked to go all the way to the end of the source file, > + * set *len now that we have the file locked. > + */ > + if (remap_flags & RFR_TO_SRC_EOF) { > + if (pos_in >= isize) > return -EINVAL; > *len = isize - pos_in; > } > @@ -1828,7 +1829,7 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, > struct inode *inode_out = file_inode(file_out); > loff_t ret; > > - WARN_ON_ONCE(remap_flags); > + WARN_ON_ONCE(remap_flags & ~(RFR_TO_SRC_EOF)); > > if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode)) > return -EISDIR; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 5c1bf1c35bc6..9f90dcd4df3b 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1725,8 +1725,10 @@ struct block_device_operations; > * These flags control the behavior of the remap_file_range function pointer. > * > * RFR_IDENTICAL_DATA: only remap if contents identical (i.e. deduplicate) <bikeshedding> not that I care so much, but is there any reason you chose to use _IDENTICAL_ vs. _SAME_? The latter is shorter and already engraved in the dedup uapi. </bikeshedding> > + * RFR_TO_SRC_EOF: remap to the end of the source file > */ > #define RFR_IDENTICAL_DATA (1 << 0) > +#define RFR_TO_SRC_EOF (1 << 1) > So what is the best way to make sure that all filesystems can properly handle this flag? and the RFR_CAN_SHORTEN flag? The way that your patches took is to not check for invalid flags at all in filesystems, but I don't think that is a viable option. Another way would be to individually add those flags to invalid flags check in all relevant filesystems. Another way would be to follow a pattern similar to fiemap_check_flags(), except in case filesystem does not declare to support the RFR_ "advisory" flags, it will not fail the operation. Comparing to FIEMAP_ flags, no filesystem would have needed to declare support for FIEMAP_FLAG_SYNC, because vfs dealt with it anyway before calling into the filesystem. So de-facto, any filesystem supports FIEMAP_FLAG_SYNC without doing anything, but it is still worth passing the flag into filesystem in case it matter (it does for overlayfs). I am not sure if giving fiemap_check_flags() as an example for sorting out VFS API flags is a good idea, because I completely don't understand what is going on in ext4_fiemap(). Why do FIEMAP_FLAG_CACHE and the case of !EXT4_INODE_EXTENTS bypass fiemap_check_flags()? Is there a rational behind all this or just plain old API bit rot? If bit rot, then perhaps the fiemap_check_flags() wasn't a good enough coding pattern? Thanks, Amir.