Re: [REGRESSION] generic/564 is failing in fs-next

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Mon, 21 Oct 2024 11:15:22 -0700

On Mon, Oct 21, 2024 at 02:49:54PM +0200, Christian Brauner wrote:
> On Sat, Oct 19, 2024 at 09:16:01AM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 18, 2024 at 12:28:37PM -0400, Theodore Ts'o wrote:
> > > I've been running a watcher which automatically kicks off xfstests on
> > > some 20+ file system configurations for btrfs, ext4, f2fs, and
> > > xfstests every time fs-next gets updated, and I've noticed that
> > > generic/564 has been failing essentially for all of the configurations
> > > that I test.  The test succeeds on rc3; it's only failing on fs-next,
> > > so it's something in Linux next.
> > > 
> > > The weird thing is when I attempted to bisect it (and I've tried twice
> > > in the last two days) the bisection identifies the first bad commit as
> > > Stephen's merge of vfs-branuer into linux-next:
> > > 
> > >    commit b3efa2373eed4e08e62b50898f8c3a4e757e14c3 (linux-next/fs-next)
> > >    Merge: 233650c5fbb8 2232c1874e5c
> > >    Author: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx>
> > >    Date:   Thu Oct 17 12:45:50 2024 +1100
> > > 
> > >        next-20241016/vfs-brauner
> > >        
> > >        # Conflicts:
> > >        #       fs/btrfs/file.c
> > >        #       include/linux/iomap.h
> > > 
> > > The merge resolution looks utterly innocuous, it seems unrelated to
> > > what generic/564 tests, which is the errors returned by copy_file_range(2):
> > > 
> > >     # Exercise copy_file_range() syscall error conditions.
> > >     #
> > >     # This is a regression test for kernel commit:
> > >     #   96e6e8f4a68d ("vfs: add missing checks to copy_file_range")
> > >     #
> > > 
> > > 
> > > # diff -u /root/xfstests/tests/generic/564.out /results/ext4/results-4k/generic/564.out.bad
> > > --- /root/xfstests/tests/generic/564.out        2024-10-15 13:27:36.000000000 
> > > -0400
> > > +++ /results/ext4/results-4k/generic/564.out.bad        2024-10-18 12:23:58.62
> > > 9855983 -0400
> > > @@ -29,9 +29,10 @@
> > >  copy_range: Value too large for defined data type
> > >  
> > >  source range beyond 8TiB returns 0
> > > +copy_range: Value too large for defined data type
> > >  
> > >  destination range beyond 8TiB returns EFBIG
> > > -copy_range: File too large
> > > +copy_range: Value too large for defined data type
> > >  
> > >  destination larger than rlimit returns EFBIG
> > >  File size limit exceeded
> > > 
> > > 
> > > Could someone take a look, and let me know if I've missed something
> > > obvious?
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/fs/read_write.c?h=fs-next&id=0f0f217df68fd72d91d2de6e85a6dd80fa1f5c95
> > perhaps?
> > 
> > I think the problem here is that in the old code:
> > 
> > 	pos_in + count < pos_in
> > 
> > @count is unsigned, so I think the compiler uses an unsigned comparison
> > and thus pos_in + count is a very large positive value, instead of the
> > negative value that the code author (who could possibly be me :P)
> > thought they were getting.  Hence this now triggers EOVERFLOW instead of
> > the "Shorten the copy to EOF" or generic_write_check_limits EFBIG logic.
> > 
> > To Mr. Sun: did you see these regressions when you tested this patch?
> 
> So we should drop this patch for now.

It definitely shouldn't go upstream.  I was assuming that the submitter
had actually *tested* the change before sending it.

I /think/ the validation could be fixed by making
generic_copy_file_checks do something like this:

	if (pos_in < 0 || pos_out < 0)
		return -EINVAL;

	size_in = i_size_read(inode_in);
	if (pos_in >= size_in)
		count = 0;
	else
		count = min(count, size_in - pos_in);

	if (check_add_overflow(pos_in, count, &tmp))
		return -EOVERFLOW;
	if (check_add_overflow(pos_out, count, &tmp))
		return -EOVERFLOW;

	ret = generic_write_check_limits(file_out, pos_out, &count);

instead of what it does now... but I'm not convinced the overflow checks
do much since I think we already constrain count so that it can't
overflow either file.

--D