I've been trying to clear various failing or flaky tests, and in that context I've been finding that generic/269 is failing with a probability of ~5% on a wide variety of test scenarios on ext4, xfs, btrfs, and f2fs on 6.10-rc2 and on fs-next. (See below for the details; the failure probability ranges from 1% to 10% depending on the test config.) What generic/269 does is to run fsstress and ENOSPC hitters in parallel, and checks to make sure the file system is consistent at the end of the tests. Failure is caused by the umount of the file system failing with EBUSY. I've tried adding a sync and a "sync -f $SCRATCH_MNT" before the attempted _scratch_umount, and that doesn't seem to change the failure. However, on a failure, if you sleep for 10 seconds, and then retry the unmount, this seems to make the proble go away. This is despite the fact that we do wait for the fstress process to exit --- I vaguely recall that there is some kind of RCU failure which means that the umount will not reliably succeed under some circumstances. Do we think this is the right fix? (Note: when I tried shortening the sleep 10 to sleep 1, the problem came back; so this seems like a real hack. Thoughts?) Thanks, - Ted diff --git a/tests/generic/269 b/tests/generic/269 index 29f453735..dad02abf3 --- a/tests/generic/269 +++ b/tests/generic/269 @@ -51,9 +51,12 @@ if ! _workout; then fi if ! _scratch_unmount; then + sleep 10 + if ! _scratch_unmount ; then echo "failed to umount" status=1 exit + fi fi status=0 exit ext4/4k: 50 tests, 2 failures, 1339 seconds Flaky: generic/269: 4% (2/50) ext4/1k: 50 tests, 5 failures, 1224 seconds Flaky: generic/269: 10% (5/50) ext4/ext3: 50 tests, 1477 seconds ext4/encrypt: 50 tests, 2 failures, 1253 seconds Flaky: generic/269: 4% (2/50) ext4/nojournal: 50 tests, 1 failures, 1503 seconds Flaky: generic/269: 2% (1/50) ext4/ext3conv: 50 tests, 4 failures, 1294 seconds Flaky: generic/269: 8% (4/50) ext4/adv: 50 tests, 2 failures, 1263 seconds Flaky: generic/269: 4% (2/50) ext4/dioread_nolock: 50 tests, 3 failures, 1327 seconds Flaky: generic/269: 6% (3/50) ext4/data_journal: 50 tests, 1 failures, 1317 seconds Flaky: generic/269: 2% (1/50) ext4/bigalloc_4k: 50 tests, 2 failures, 1193 seconds Flaky: generic/269: 4% (2/50) ext4/bigalloc_1k: 50 tests, 1259 seconds ext4/dax: 50 tests, 5 failures, 1136 seconds Flaky: generic/269: 10% (5/50) xfs/4k: 50 tests, 3 failures, 1211 seconds Flaky: generic/269: 6% (3/50) xfs/1k: 50 tests, 1219 seconds xfs/v4: 50 tests, 4 failures, 1206 seconds Flaky: generic/269: 8% (4/50) xfs/adv: 50 tests, 1 failures, 1206 seconds Flaky: generic/269: 2% (1/50) xfs/quota: 50 tests, 2 failures, 1460 seconds Flaky: generic/269: 4% (2/50) xfs/quota_1k: 50 tests, 1449 seconds xfs/dirblock_8k: 50 tests, 1 failures, 1351 seconds Flaky: generic/269: 2% (1/50) xfs/realtime: 50 tests, 1286 seconds xfs/realtime_28k_logdev: 50 tests, 1234 seconds xfs/realtime_logdev: 50 tests, 1259 seconds xfs/logdev: 50 tests, 3 failures, 1390 seconds Flaky: generic/269: 6% (3/50) xfs/dax: 50 tests, 1125 seconds btrfs/default: 50 tests, 1573 seconds f2fs/default: 50 tests, 1471 seconds f2fs/encrypt: 50 tests, 1 failures, 1424 seconds Flaky: generic/269: 2% (1/50) Totals: 1350 tests, 0 skipped, 42 failures, 0 errors, 35449s