Flaky test: generic:269 (EBUSY on umount)

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



I've been trying to clear various failing or flaky tests, and in that
context I've been finding that generic/269 is failing with a
probability of ~5% on a wide variety of test scenarios on ext4, xfs,
btrfs, and f2fs on 6.10-rc2 and on fs-next.  (See below for the
details; the failure probability ranges from 1% to 10% depending on
the test config.)

What generic/269 does is to run fsstress and ENOSPC hitters in
parallel, and checks to make sure the file system is consistent at the
end of the tests.  Failure is caused by the umount of the file system
failing with EBUSY.  I've tried adding a sync and a "sync -f
$SCRATCH_MNT" before the attempted _scratch_umount, and that doesn't
seem to change the failure.

However, on a failure, if you sleep for 10 seconds, and then retry the
unmount, this seems to make the proble go away.  This is despite the
fact that we do wait for the fstress process to exit --- I vaguely
recall that there is some kind of RCU failure which means that the
umount will not reliably succeed under some circumstances.  Do we
think this is the right fix?

(Note: when I tried shortening the sleep 10 to sleep 1, the problem
came back; so this seems like a real hack.   Thoughts?)

Thanks,

     	      	   	      	     - Ted

diff --git a/tests/generic/269 b/tests/generic/269
index 29f453735..dad02abf3
--- a/tests/generic/269
+++ b/tests/generic/269
@@ -51,9 +51,12 @@ if ! _workout; then
 fi
 
 if ! _scratch_unmount; then
+    sleep 10
+    if ! _scratch_unmount ; then
 	echo "failed to umount"
 	status=1
 	exit
+    fi
 fi
 status=0
 exit


ext4/4k: 50 tests, 2 failures, 1339 seconds
  Flaky: generic/269:  4% (2/50)
ext4/1k: 50 tests, 5 failures, 1224 seconds
  Flaky: generic/269: 10% (5/50)
ext4/ext3: 50 tests, 1477 seconds
ext4/encrypt: 50 tests, 2 failures, 1253 seconds
  Flaky: generic/269:  4% (2/50)
ext4/nojournal: 50 tests, 1 failures, 1503 seconds
  Flaky: generic/269:  2% (1/50)
ext4/ext3conv: 50 tests, 4 failures, 1294 seconds
  Flaky: generic/269:  8% (4/50)
ext4/adv: 50 tests, 2 failures, 1263 seconds
  Flaky: generic/269:  4% (2/50)
ext4/dioread_nolock: 50 tests, 3 failures, 1327 seconds
  Flaky: generic/269:  6% (3/50)
ext4/data_journal: 50 tests, 1 failures, 1317 seconds
  Flaky: generic/269:  2% (1/50)
ext4/bigalloc_4k: 50 tests, 2 failures, 1193 seconds
  Flaky: generic/269:  4% (2/50)
ext4/bigalloc_1k: 50 tests, 1259 seconds
ext4/dax: 50 tests, 5 failures, 1136 seconds
  Flaky: generic/269: 10% (5/50)
xfs/4k: 50 tests, 3 failures, 1211 seconds
  Flaky: generic/269:  6% (3/50)
xfs/1k: 50 tests, 1219 seconds
xfs/v4: 50 tests, 4 failures, 1206 seconds
  Flaky: generic/269:  8% (4/50)
xfs/adv: 50 tests, 1 failures, 1206 seconds
  Flaky: generic/269:  2% (1/50)
xfs/quota: 50 tests, 2 failures, 1460 seconds
  Flaky: generic/269:  4% (2/50)
xfs/quota_1k: 50 tests, 1449 seconds
xfs/dirblock_8k: 50 tests, 1 failures, 1351 seconds
  Flaky: generic/269:  2% (1/50)
xfs/realtime: 50 tests, 1286 seconds
xfs/realtime_28k_logdev: 50 tests, 1234 seconds
xfs/realtime_logdev: 50 tests, 1259 seconds
xfs/logdev: 50 tests, 3 failures, 1390 seconds
  Flaky: generic/269:  6% (3/50)
xfs/dax: 50 tests, 1125 seconds
btrfs/default: 50 tests, 1573 seconds
f2fs/default: 50 tests, 1471 seconds
f2fs/encrypt: 50 tests, 1 failures, 1424 seconds
  Flaky: generic/269:  2% (1/50)
Totals: 1350 tests, 0 skipped, 42 failures, 0 errors, 35449s





[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux