On Mon, Oct 21, 2024 at 02:49:54PM +0200, Christian Brauner wrote: > On Sat, Oct 19, 2024 at 09:16:01AM -0700, Darrick J. Wong wrote: > > On Fri, Oct 18, 2024 at 12:28:37PM -0400, Theodore Ts'o wrote: > > > I've been running a watcher which automatically kicks off xfstests on > > > some 20+ file system configurations for btrfs, ext4, f2fs, and > > > xfstests every time fs-next gets updated, and I've noticed that > > > generic/564 has been failing essentially for all of the configurations > > > that I test. The test succeeds on rc3; it's only failing on fs-next, > > > so it's something in Linux next. > > > > > > The weird thing is when I attempted to bisect it (and I've tried twice > > > in the last two days) the bisection identifies the first bad commit as > > > Stephen's merge of vfs-branuer into linux-next: > > > > > > commit b3efa2373eed4e08e62b50898f8c3a4e757e14c3 (linux-next/fs-next) > > > Merge: 233650c5fbb8 2232c1874e5c > > > Author: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx> > > > Date: Thu Oct 17 12:45:50 2024 +1100 > > > > > > next-20241016/vfs-brauner > > > > > > # Conflicts: > > > # fs/btrfs/file.c > > > # include/linux/iomap.h > > > > > > The merge resolution looks utterly innocuous, it seems unrelated to > > > what generic/564 tests, which is the errors returned by copy_file_range(2): > > > > > > # Exercise copy_file_range() syscall error conditions. > > > # > > > # This is a regression test for kernel commit: > > > # 96e6e8f4a68d ("vfs: add missing checks to copy_file_range") > > > # > > > > > > > > > # diff -u /root/xfstests/tests/generic/564.out /results/ext4/results-4k/generic/564.out.bad > > > --- /root/xfstests/tests/generic/564.out 2024-10-15 13:27:36.000000000 > > > -0400 > > > +++ /results/ext4/results-4k/generic/564.out.bad 2024-10-18 12:23:58.62 > > > 9855983 -0400 > > > @@ -29,9 +29,10 @@ > > > copy_range: Value too large for defined data type > > > > > > source range beyond 8TiB returns 0 > > > +copy_range: Value too large for defined data type > > > > > > destination range beyond 8TiB returns EFBIG > > > -copy_range: File too large > > > +copy_range: Value too large for defined data type > > > > > > destination larger than rlimit returns EFBIG > > > File size limit exceeded > > > > > > > > > Could someone take a look, and let me know if I've missed something > > > obvious? > > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/fs/read_write.c?h=fs-next&id=0f0f217df68fd72d91d2de6e85a6dd80fa1f5c95 > > perhaps? > > > > I think the problem here is that in the old code: > > > > pos_in + count < pos_in > > > > @count is unsigned, so I think the compiler uses an unsigned comparison > > and thus pos_in + count is a very large positive value, instead of the > > negative value that the code author (who could possibly be me :P) > > thought they were getting. Hence this now triggers EOVERFLOW instead of > > the "Shorten the copy to EOF" or generic_write_check_limits EFBIG logic. > > > > To Mr. Sun: did you see these regressions when you tested this patch? > > So we should drop this patch for now. It definitely shouldn't go upstream. I was assuming that the submitter had actually *tested* the change before sending it. I /think/ the validation could be fixed by making generic_copy_file_checks do something like this: if (pos_in < 0 || pos_out < 0) return -EINVAL; size_in = i_size_read(inode_in); if (pos_in >= size_in) count = 0; else count = min(count, size_in - pos_in); if (check_add_overflow(pos_in, count, &tmp)) return -EOVERFLOW; if (check_add_overflow(pos_out, count, &tmp)) return -EOVERFLOW; ret = generic_write_check_limits(file_out, pos_out, &count); instead of what it does now... but I'm not convinced the overflow checks do much since I think we already constrain count so that it can't overflow either file. --D