On Wed, Jun 12, 2024 at 05:29:48PM +0100, Theodore Ts'o wrote: > I've been trying to clear various failing or flaky tests, and in that > context I've been finding that generic/269 is failing with a > probability of ~5% on a wide variety of test scenarios on ext4, xfs, > btrfs, and f2fs on 6.10-rc2 and on fs-next. (See below for the > details; the failure probability ranges from 1% to 10% depending on > the test config.) > > What generic/269 does is to run fsstress and ENOSPC hitters in > parallel, and checks to make sure the file system is consistent at the > end of the tests. Failure is caused by the umount of the file system > failing with EBUSY. I've tried adding a sync and a "sync -f > $SCRATCH_MNT" before the attempted _scratch_umount, and that doesn't > seem to change the failure. > > However, on a failure, if you sleep for 10 seconds, and then retry the > unmount, this seems to make the proble go away. This is despite the > fact that we do wait for the fstress process to exit --- I vaguely > recall that there is some kind of RCU failure which means that the > umount will not reliably succeed under some circumstances. Do we > think this is the right fix? > > (Note: when I tried shortening the sleep 10 to sleep 1, the problem > came back; so this seems like a real hack. Thoughts?) I don't see this problem; if you apply this to fstests to turn off io_uring: https://lore.kernel.org/fstests/169335095953.3534600.16325849760213190849.stgit@frogsfrogsfrogs/#r do the problems go away? --D > Thanks, > > - Ted > > diff --git a/tests/generic/269 b/tests/generic/269 > index 29f453735..dad02abf3 > --- a/tests/generic/269 > +++ b/tests/generic/269 > @@ -51,9 +51,12 @@ if ! _workout; then > fi > > if ! _scratch_unmount; then > + sleep 10 > + if ! _scratch_unmount ; then > echo "failed to umount" > status=1 > exit > + fi > fi > status=0 > exit > > > ext4/4k: 50 tests, 2 failures, 1339 seconds > Flaky: generic/269: 4% (2/50) > ext4/1k: 50 tests, 5 failures, 1224 seconds > Flaky: generic/269: 10% (5/50) > ext4/ext3: 50 tests, 1477 seconds > ext4/encrypt: 50 tests, 2 failures, 1253 seconds > Flaky: generic/269: 4% (2/50) > ext4/nojournal: 50 tests, 1 failures, 1503 seconds > Flaky: generic/269: 2% (1/50) > ext4/ext3conv: 50 tests, 4 failures, 1294 seconds > Flaky: generic/269: 8% (4/50) > ext4/adv: 50 tests, 2 failures, 1263 seconds > Flaky: generic/269: 4% (2/50) > ext4/dioread_nolock: 50 tests, 3 failures, 1327 seconds > Flaky: generic/269: 6% (3/50) > ext4/data_journal: 50 tests, 1 failures, 1317 seconds > Flaky: generic/269: 2% (1/50) > ext4/bigalloc_4k: 50 tests, 2 failures, 1193 seconds > Flaky: generic/269: 4% (2/50) > ext4/bigalloc_1k: 50 tests, 1259 seconds > ext4/dax: 50 tests, 5 failures, 1136 seconds > Flaky: generic/269: 10% (5/50) > xfs/4k: 50 tests, 3 failures, 1211 seconds > Flaky: generic/269: 6% (3/50) > xfs/1k: 50 tests, 1219 seconds > xfs/v4: 50 tests, 4 failures, 1206 seconds > Flaky: generic/269: 8% (4/50) > xfs/adv: 50 tests, 1 failures, 1206 seconds > Flaky: generic/269: 2% (1/50) > xfs/quota: 50 tests, 2 failures, 1460 seconds > Flaky: generic/269: 4% (2/50) > xfs/quota_1k: 50 tests, 1449 seconds > xfs/dirblock_8k: 50 tests, 1 failures, 1351 seconds > Flaky: generic/269: 2% (1/50) > xfs/realtime: 50 tests, 1286 seconds > xfs/realtime_28k_logdev: 50 tests, 1234 seconds > xfs/realtime_logdev: 50 tests, 1259 seconds > xfs/logdev: 50 tests, 3 failures, 1390 seconds > Flaky: generic/269: 6% (3/50) > xfs/dax: 50 tests, 1125 seconds > btrfs/default: 50 tests, 1573 seconds > f2fs/default: 50 tests, 1471 seconds > f2fs/encrypt: 50 tests, 1 failures, 1424 seconds > Flaky: generic/269: 2% (1/50) > Totals: 1350 tests, 0 skipped, 42 failures, 0 errors, 35449s > >