On Tue, Jun 11, 2024 at 09:37:01AM -0700, Darrick J. Wong wrote: > On Tue, Jun 11, 2024 at 09:52:10AM +0100, Theodore Ts'o wrote: > > Hi, I've recently found a flaky test, generic/085 on 6.10-rc2 and > > fs-next. It's failing on both ext4 and xfs, and it reproduces more > > easiy with the dax config: > > > > xfs/4k: 20 tests, 1 failures, 137 seconds > > Flaky: generic/085: 5% (1/20) > > xfs/dax: 20 tests, 11 failures, 71 seconds > > Flaky: generic/085: 55% (11/20) > > ext4/4k: 20 tests, 111 seconds > > ext4/dax: 20 tests, 8 failures, 69 seconds > > Flaky: generic/085: 40% (8/20) > > Totals: 80 tests, 0 skipped, 20 failures, 0 errors, 388s > > > > The failure is caused by a WARN_ON in fs_bdev_thaw() in fs/super.c: > > > > static int fs_bdev_thaw(struct block_device *bdev) > > { > > ... > > sb = get_bdev_super(bdev); > > if (WARN_ON_ONCE(!sb)) > > return -EINVAL; > > > > > > The generic/085 test which exercises races between the fs > > freeze/unfeeze and mount/umount code paths, so this appears to be > > either a VFS-level or block device layer bug. Modulo the warning, it > > looks relatively harmless, so I'll just exclude generic/085 from my > > test appliance, at least for now. Hopefully someone will have a > > chance to take a look at it? > > I think this can happen if fs_bdev_thaw races with unmount? > > Let's say that the _umount $lvdev in the second loop in generic/085 > starts the unmount process, which clears SB_ACTIVE from the super_block. > Then the first loop tries to freeze the bdev (and fails), and > immediately tries to thaw the bdev. The thaw code calls fs_bdev_thaw > because the unmount process is still running & so the fs is still > holding the bdev. But get_bdev_super sees that SB_ACTIVE has been > cleared from the super_block so it returns NULL, which trips the > warning. > > If that's correct, then I think the WARN_ON_ONCE should go away. I've been trying to reproduce this with pmem yesterday and wasn't able to. SB_ACTIVE is cleared in generic_shutdown_super(). If we're in there we know that there are no active references to the superblock anymore. That includes freeze requests: * Freezes are nestable from kernel and userspace but all nested freezers share a single active reference in sb->s_active. * The nested freeze requests are counted in sb->s_writers.freeze_{kcount,ucount}. * The last thaw request (sb->s_writers.freeze_kcount + sb->s_writers.freeze_ucount == 0) releases the sb->s_active reference. * Nested freezes from the block layer via bdev_freeze() are additionally counted in bdev->bd_fsfreeze_count protected by bdev->bd_fsfreeze_mutex. The device mapper suspend logic that generic/085 uses relies on bdev_freeze() and bdev_thaw() from the block layer. So all those dm freezes should be counted in bdev->bd_fsfreeze_count. And device mapper has logic to ensure that only a single freeze request is ever made. So bdev->bd_fsfreeze_count in that test should be 1. So when a bdev_thaw() request comes via dm_suspend(): * bdev_thaw() is called and encounters bdev->bd_fsfreeze_count == 1. * As there aren't any fs initiated freezes we know that sb->s_writers.kcount == 0 and sb->s_writer.ucount == 1 == bdev->bd_fsfreeze_count. * When fs_bdev_thaw() the superblock is still valid and we've got at least one active reference taken during the bdev_freeze() request. * get_bdev_super() tries to grab an active reference to the superblock but fails. That can indeed happen because SB_ACTIVE is cleared. But for that to be the case we must've dropped the last active reference, forgot to take it during the original freeze, miscounted bdev->bd_fsfreeze_count, or missed a nested sb->s_writers.freeze_{kcount,ucount}. What's the kernel config and test config that's used?