On Mon, May 16, 2022 at 06:11:53PM +1000, Dave Chinner wrote: > On Thu, May 12, 2022 at 12:13:29PM -0700, Darrick J. Wong wrote: > > On Thu, May 12, 2022 at 02:02:33PM -0500, Eric Sandeen wrote: > > > On 5/5/22 11:05 AM, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > Make sure we detect and correct mismatches between the V5 features > > > > described in the primary and the secondary superblocks. > > > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > --- > > > > > > > > > > + if ((mp->m_sb.sb_features_incompat ^ sb->sb_features_incompat) & > > > > + ~XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR) { > > > > > > I'd like to add a comment about why XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR is special. > > > (Why is XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR special? Just because userspace doesn't > > > bother to set it on all superblocks in the upgrade paths, right?) > > > > Right -- we only set it on the primary super to force users to run > > xfs_repair. Repair will clear NEEDSREPAIR on the primary and all > > secondaries before it exits > > Oh no it doesn't! :) > > I just debugged the persistent xfs/158 failure down to repair only > writing the secondary superblocks with "features_changed" is true. > > xfs/158 trashes the repair process after setting the inobtcnt and > needrepair feature bits in the primary superblock. That's the only > code that sets the internal "features_changed" flag that triggers > secondary superblock writeback. If repair then dies after this it > won't get set on the next run without the upgrade options set on the > command line. Hence we re-run repair without the upgrade feature > being enabled to check that it still recovers correctly. > > The result is that repair does the right thing in completing the > feature upgrade because it sees the feature flag in the primary > superblock, but it *never sets "features_changed"* and so the > secondary superblocks are not updated when it is done. It then goes > on to check NEEDSREPAIR in the primary superblock and clears it, > allowing the fileystem to mount again. > > This results in secondary superblocks with in-progress set and > mis-matched feature bits, resulting in xfs_scrub reporting corrupt > secondary superblocks and so failing the test. > > This change will probably mask that specific bug - it'll still be > lying their waiting to pounce on some unsuspecting bystander, but it > will be masked by other code detecting the feature mismatch that it > causes and correcting it that way... Ah, ok. You're pointing out that repair needs to set features_changed right after printing "clearing needsrepair flag and regenerating metadata", and that this patch sort of papers over the underlying issue. The genesis of this patch wasn't xfs/158 failing, it was the secondary sb fuzz testing noticing that xfs_scrub reported a failure whereas xfs_repair didn't. So, I'll go add an extra patch to fix the upgrade case, and I think we can let Eric add this one to his tree. --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx