On Mon, Jan 19, 2015 at 09:30:21AM -0500, Brian Foster wrote: > The secondary superblock verification in xfs_repair was subject to a bug > that unnecessarily leads to a brute force superblock scan if the last > superblock in the fs happens to be corrupt. Normally, xfs_repair handles > one-off superblock corruption gracefully using a heuristic that finds > the most consistent superblock content across the set of secondary > superblocks. > > Create a regression test for xfs_repair that corrupts the last > superblock in the fs. Verify the superblock is updated from the > previously verified sb content and a brute force scan is not initiated. > In the event of failure, detect that a brute force scan has started and > abort the repair in order to fail the test quickly. > > To support the test, extend the xfs_repair filter to handle corrupted > superblock repair output and provide generic test output for arbitrary > AG counts. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > Hi all, > > This is an xfs_repair regression test to trigger the problem fixed by > the following previously posted fix: > > http://oss.sgi.com/archives/xfs/2015-01/msg00244.html > > Thoughts appreciated, thanks. ... > +# Start and monitor an xfs_repair of the scratch device. This test can induce a > +# time consuming brute force superblock scan. Since a brute force scan means > +# test failure, detect it and end the repair. > +_xfs_repair_noscan() > +{ > + # invoke repair directly so we can kill the process if need be > + $XFS_REPAIR_PROG $SCRATCH_DEV 2>&1 | tee -a $seqres.full > $tmp.repair & > + repair_pid=$! > + > + # monitor progress for as long as it is running > + while [ `ps -q $repair_pid > /dev/null; echo $?` == 0 ]; do while [ `pgrep xfs_repair` -eq 0 ]; do > + grep "couldn't verify primary superblock" $tmp.repair \ > + > /dev/null 2>&1 > + if [ $? == 0 ]; then > + # we've started a brute force scan. kill repair and > + # fail the test > + kill -9 $repair_pid >> $seqres.full 2>&1 > + wait >> $seqres.full 2>&1 > + > + _fail "xfs_repair resorted to brute force scan" > + fi > + > + sleep 1 > + done > + > + wait > + > + cat $tmp.repair | _filter_repair > +} > + > +rm -f $seqres.full > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/repair > + > +# real QA test starts here > + > +# Modify as appropriate. > +_supported_fs xfs > +_supported_os Linux > +_require_scratch_nocheck > + > +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed" > + > +# corrupt the last secondary sb in the fs > +agcount=`$XFS_DB_PROG -c "sb" -c "p agcount" $SCRATCH_DEV | awk '{ print $3 }'` scratch_mkfs | _filter_mkfs 2> $tmp.mkfs . $tmp.mkfs And now you have the agcount variable already set up (and most other fs geometry variables that mkfs outputs). > +last_secondary=$((agcount - 1)) > +$XFS_DB_PROG -x -c "sb $last_secondary" -c "type data" \ you can just use "sb $((agcount - 1))" directly. The comment above tells us that it's the last secondary sb we are corrupting.... Otherwise look sok. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs