On Tue, Jun 28, 2022 at 01:50:40PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > Dave Chinner complained about xfs_scrub failures coming from xfs/158. > That test induces xfs_repair to fail while upgrading a filesystem to > have the inobtcount feature, and then restarts xfs_repair to finish the > upgrade. When the second xfs_repair run starts, it will find that the > primary super has NEEDSREPAIR set, along with whatever new feature that > we were trying to add to the filesystem. > > From there, repair completes the upgrade in much the same manner as the > first repair run would have, with one big exception -- it forgets to set > features_changed to trigger rewriting of the secondary supers at the end > of repair. This results in discrepancies between the supers: > > # XFS_REPAIR_FAIL_AFTER_PHASE=2 xfs_repair -c inobtcount=1 /dev/sdf > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > Adding inode btree counts to filesystem. > Killed > # xfs_repair /dev/sdf > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > clearing needsrepair flag and regenerating metadata > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > bad inobt block count 0, saw 1 > bad finobt block count 0, saw 1 > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 1 > - agno = 2 > - agno = 0 > - agno = 3 > Phase 5 - rebuild AG headers and trees... > - reset superblock... > Phase 6 - check inode connectivity... > - resetting contents of realtime bitmap and summary inodes > - traversing filesystem ... > - traversal finished ... > - moving disconnected inodes to lost+found ... > Phase 7 - verify and correct link counts... > done > # xfs_db -c 'sb 0' -c 'print' -c 'sb 1' -c 'print' /dev/sdf | \ > egrep '(features_ro_compat|features_incompat)' > features_ro_compat = 0xd > features_incompat = 0xb > features_ro_compat = 0x5 > features_incompat = 0xb > > Curiously, re-running xfs_repair will not trigger any warnings about the > featureset mismatch between the primary and secondary supers. xfs_scrub > immediately notices, which is what causes xfs/158 to fail. > > This discrepancy doesn't happen when the upgrade completes successfully > in a single repair run, so we need to teach repair to rewrite the > secondaries at the end of repair any time needsrepair was set. > > Reported-by: Dave Chinner <david@xxxxxxxxxxxxx> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > repair/agheader.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > > diff --git a/repair/agheader.c b/repair/agheader.c > index 36da1395..e91509d0 100644 > --- a/repair/agheader.c > +++ b/repair/agheader.c > @@ -552,6 +552,14 @@ secondary_sb_whack( > else > do_warn( > _("would clear needsrepair flag and regenerate metadata\n")); > + /* > + * If needsrepair is set on the primary super, there's > + * a possibility that repair crashed during an upgrade. > + * Set features_changed to ensure that the secondary > + * supers are rewritten with the new feature bits once > + * we've finished the upgrade. > + */ > + features_changed = true; > } else { > /* > * Quietly clear needsrepair on the secondary supers as > > Looks good. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx