On Tue, Jun 11, 2024 at 09:52:10AM +0100, Theodore Ts'o wrote: > Hi, I've recently found a flaky test, generic/085 on 6.10-rc2 and > fs-next. It's failing on both ext4 and xfs, and it reproduces more > easiy with the dax config: > > xfs/4k: 20 tests, 1 failures, 137 seconds > Flaky: generic/085: 5% (1/20) > xfs/dax: 20 tests, 11 failures, 71 seconds > Flaky: generic/085: 55% (11/20) > ext4/4k: 20 tests, 111 seconds > ext4/dax: 20 tests, 8 failures, 69 seconds > Flaky: generic/085: 40% (8/20) > Totals: 80 tests, 0 skipped, 20 failures, 0 errors, 388s > > The failure is caused by a WARN_ON in fs_bdev_thaw() in fs/super.c: > > static int fs_bdev_thaw(struct block_device *bdev) > { > ... > sb = get_bdev_super(bdev); > if (WARN_ON_ONCE(!sb)) > return -EINVAL; > > > The generic/085 test which exercises races between the fs > freeze/unfeeze and mount/umount code paths, so this appears to be > either a VFS-level or block device layer bug. Modulo the warning, it > looks relatively harmless, so I'll just exclude generic/085 from my > test appliance, at least for now. Hopefully someone will have a > chance to take a look at it? I think this can happen if fs_bdev_thaw races with unmount? Let's say that the _umount $lvdev in the second loop in generic/085 starts the unmount process, which clears SB_ACTIVE from the super_block. Then the first loop tries to freeze the bdev (and fails), and immediately tries to thaw the bdev. The thaw code calls fs_bdev_thaw because the unmount process is still running & so the fs is still holding the bdev. But get_bdev_super sees that SB_ACTIVE has been cleared from the super_block so it returns NULL, which trips the warning. If that's correct, then I think the WARN_ON_ONCE should go away. --D > Thanks, > > - Ted >