On Fri, Feb 12, 2021 at 12:59 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote: > > On Thu, Feb 11, 2021 at 08:53:47PM -0800, Darrick J. Wong wrote: > > On Fri, Feb 12, 2021 at 12:44:05PM +0800, Nicolas Boichat wrote: > > > copy_file_range (which calls generic_copy_file_checks) uses the > > > inode file size to adjust the copy count parameter. This breaks > > > with special filesystems like procfs/sysfs/debugfs/tracefs, where > > > the file size appears to be zero, but content is actually returned > > > when a read operation is performed. Other issues would also > > > happen on partial writes, as the function would attempt to seek > > > in the input file. > > > > > > Use the newly introduced FS_GENERATED_CONTENT filesystem flag > > > to return -EOPNOTSUPP: applications can then retry with a more > > > usual read/write based file copy (the fallback code is usually > > > already present to handle older kernels). > > > > > > Signed-off-by: Nicolas Boichat <drinkcat@xxxxxxxxxxxx> > > > --- > > > > > > fs/read_write.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > index 0029ff2b0ca8..80322e89fb0a 100644 > > > --- a/fs/read_write.c > > > +++ b/fs/read_write.c > > > @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > if (flags != 0) > > > return -EINVAL; > > > > > > + if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT) > > > + return -EOPNOTSUPP; > > > > Why not declare a dummy copy_file_range_nop function that returns > > EOPNOTSUPP and point all of these filesystems at it? > > > > (Or, I guess in these days where function pointers are the enemy, > > create a #define that is a cast of 0x1, and fix do_copy_file_range to > > return EOPNOTSUPP if it sees that?) I was pondering abusing ERR_PTR(-EOPNOTSUPP) for this purpose ,-P > > Oh, I see, because that doesn't help if the source file is procfs and > the dest file is (say) xfs, because the generic version will try to do > splice magic and *poof*. Yep. I mean, we could still add a check if the file_in->f_op->copy_file_range == copy_file_range_nop in do_copy_file_range... But then we'd need to sprinkle .copy_file_range = copy_file_range_nop in many many places (~700 as a lower bound[1]), since the file operation structure is defined at the file level, not at the FS level, and people are likely to forget... [1] $ git grep "struct file_operations.*=" | grep debug | wc -l 631 $ git grep "struct file_operations.*=" | grep trace | wc -l 84 > > I guess the other nit thatI can think of at this late hour is ... what > about the other virtual filesystems like configfs and whatnot? Should > we have a way to flag them as "this can't be the source of a CFR > request" as well? > > Or is it just trace/debug/proc/sysfs that have these "zero size but > readable" speshul behaviors? I did try to audit the other filesystems. The ones I spotted: - devpts should be fine (only device nodes in there) - I think pstore doesn't need the flag as it's RAM-backed and persistent. But yes, I missed configfs, thanks for catching that. I think we need to add the flag for that one (looks like the sizes are all 4K). > > --D > > > > > --D > > > > > + > > > ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len, > > > flags); > > > if (unlikely(ret)) > > > -- > > > 2.30.0.478.g8a0d178c01-goog > > >