Re: question of xfs/148 and xfs/149

Zorro Lang <zlang@xxxxxxxxxx> · Thu, 19 Sep 2019 14:59:54 +0800

On Wed, Sep 18, 2019 at 11:33:12PM -0700, Darrick J. Wong wrote:
> On Thu, Sep 19, 2019 at 01:20:33PM +0800, Zorro Lang wrote:
> > On Wed, Sep 18, 2019 at 04:10:50PM -0700, Darrick J. Wong wrote:
> > > On Wed, Sep 18, 2019 at 09:37:11AM -0700, Darrick J. Wong wrote:
> > > > On Wed, Sep 18, 2019 at 11:24:47AM +0800, Yang Xu wrote:
> > > > > 
> > > > > 
> > > > > on 2019/09/18 10:59, Zorro Lang wrote:
> > > > > > xfs/030 is weird, I've found it long time ago.
> > > > > > 
> > > > > > If I do a 'whole disk mkfs' (_scratch_mkfs_xfs), before this sized mkfs:
> > > > > > 
> > > > > >    _scratch_mkfs_xfs $DSIZE >/dev/null 2>&1
> > > > > > 
> > > > > > Everything looks clear, and test pass. I can't send a patch to do this,
> > > > > > because I don't know the reason.
> > > > > Yes. I also found running _scratch_mkfs_xfs in xfs/030 can slove this
> > > > > problem yesterday. Or, we can adjust _try_wipe_scratch_devs order in
> > > > > check(But I dont't have enough reason to explain why adjust it). as below:
> > > > 
> > > > (Yeah, I don't see any obvious reason why that would change outcomes...)
> > > > 
> > > > > --- a/check
> > > > > +++ b/check
> > > > > @@ -753,7 +753,6 @@ for section in $HOST_OPTIONS_SECTIONS; do
> > > > >                         # _check_dmesg depends on this log in dmesg
> > > > >                         touch ${RESULT_DIR}/check_dmesg
> > > > >                 fi
> > > > > -               _try_wipe_scratch_devs > /dev/null 2>&1
> > > > >                 if [ "$DUMP_OUTPUT" = true ]; then
> > > > >                         _run_seq 2>&1 | tee $tmp.out
> > > > >                         # Because $? would get tee's return code
> > > > > @@ -799,7 +798,7 @@ for section in $HOST_OPTIONS_SECTIONS; do
> > > > >                 # Scan for memory leaks after every test so that associating
> > > > >                 # a leak to a particular test will be as accurate as
> > > > > possible.
> > > > >                 _check_kmemleak || err=true
> > > > > -
> > > > > +               _try_wipe_scratch_devs > /dev/null 2>&1
> > > > >                 # test ends after all checks are done.
> > > > >                 $timestamp && _timestamp
> > > > >                 stop=`_wallclock`
> > > > > 
> > > > > > 
> > > > > > I'm not familiar with xfs_repair so much, so I don't know what happens
> > > > > > underlying. I suppose the the part after the $DSIZE affect the xfs_repair,
> > > > > > but I don't know why the wipefs can cause that, wipefs only erase 4 bytes
> > > > > > at the beginning.
> > > > > > 
> > > > >  I am finding the reasion. It seems wipefs wipes important information and
> > > > > $DSIZE option(using single agcount or dsize, it also fails ) can not format
> > > > > disk completely. If we use other options, it can pass.
> > > > 
> > > > How does mkfs fail, specifically?
> > > > 
> > > > Also, what's your storage configuration?  And lsblk -D output?
> > > 
> > > I'm still interested in the answer to these questions, but I've done a
> > > little more research and noticed that yes, xfs/030 fails if the device
> > > doesn't support zeroing discard.
> > > 
> > > First, if mkfs.xfs detects an old primary superblock, it will write
> > > zeroes to all superblocks before formatting the new filesystem.
> > > Obviously this won't be done if the device doesn't have a primary
> > > superblock.
> > > 
> > > (1) So let's say that a previous test formatted a 4GB scratch disk with
> > > all defaults, and let's say that we have 4 AGs.  The disk will look like
> > > this:
> > > 
> > >   SB0 [1G space] SB1 [1G space] SB2 [1G space] SB3 [1G space]
> > > 
> > > (2) Now we _try_wipe_scratch_devs, which wipes out the primary label:
> > > 
> > >   000 [1G space] SB1 [1G space] SB2 [1G space] SB3 [1G space]
> > > 
> > > (3) Now xfs/030 runs its special mkfs command (6AGs, 100MB disk).  If the
> > > disk supports zeroing discard, it will discard the whole device:
> > > 
> > >   <4GB of zeroes>
> > > 
> > > (4) Then it will lay down its own filesystem:
> > > 
> > >   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 4G>
> > > 
> > > (5) Next, xfs/030 zaps the primary superblock:
> > > 
> > >   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 4G>
> > > 
> > > (6) Next, xfs/030 runs xfs_repair.  It fails to find the primary sb, so it
> > > tries to find secondary superblocks.  Its first strategy is to compute
> > > the fs geometry assuming all default options.  In this case, that means
> > > 4 AGs, spaced 1G apart.  They're all zero, so it falls back to a linear
> > > scan of the disk.  It finds SB1, uses that to rewrite the primary super,
> > > and continues with the repair (which is mostly uneventful).  The test
> > > passes; this is why it works on my computer.
> > > 
> > > ---------
> > > 
> > > Now let's see what happened before _try_wipe_scratch_devs.  In step (3)
> > > mkfs would find the old superblocks and wipe the superblocks, before
> > > laying down the new superblocks:
> > > 
> > >   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> > >       000 [1G space] 000 [1G space] 000 [1G space]
> > > 
> > > Step (5) zaps the primary, yielding:
> > > 
> > >   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> > >       000 [1G space] 000 [1G space] 000 [1G space]
> > > 
> > > Step (6) fails to find a primary superblock so it tries to read backup
> > > superblocks at 1G, 2G, and 3G, but they're all zero so it falls back to
> > > the linear scan and picks up SB1 and proceeds with a mostly uneventful
> > > repair.  The test passes.
> > > 
> > > ---------
> > > 
> > > However, with _try_wipe_scratch_devs and a device that doesn't support
> > > discard (or MKFS_OPTIONS includes -K), we have a problem.  mkfs.xfs
> > > doesn't discard the device nor does it find a primary superblock, so it
> > > simply formats the new filesystem.  We end up with:
> > > 
> > >   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> > >       SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]
> > > 
> > > Where SB[0-5] are from the filesystem that xfs/030 formatted but
> > > SB'[1-3] are from the filesystem that was on the scratch disk before
> > > xfs/030 even started.  Uhoh.
> > > 
> > > Step (5) zaps the primary, yielding:
> > > 
> > >   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> > >       SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]
> > > 
> > > Step (6) fails to find a primary superblock so it tries to read backup
> > > superblocks at 1G.  It finds SB'1 and uses that to reconstruct the /old/
> > > filesystem, with what looks like massive filesystem damage.  This
> > > results in test failure.  Oops.
> > > 
> > > ----------
> > > 
> > > The reason for adding _try_wipe_scratch_devs was to detect broken tests
> > > that started using the filesystem on the scratch device (if any) before
> > > (or without!) formatting the scratch device.  That broken behavior could
> > > result in spurious test failures when xfstests was run in random order
> > > mode either due to mounting an unformatted device or mounting a corrupt
> > > fs that some other test left behind.
> > > 
> > > I guess a fix for XFS would be have _try_wipe_scratch_devs try to read
> > > the primary superblock to compute the AG geometry and then erase all
> > > superblocks that could be on the disk; and then compute the default
> > > geometry and wipe out all those superblocks too.
> > > 
> > > Does any of that square with what you've been seeing?
> > 
> > Thanks Darrick, so what I supposed might be true?
> > "
> >   > > > > I'm not familiar with xfs_repair so much, so I don't know what happens
> >   > > > > underlying. I suppose the the part after the $DSIZE affect the xfs_repair,
> > "
> > 
> > The sized mkfs.xfs (without discard) leave old on-disk structure behind $DSIZE
> > space, it cause xfs_repair try to use odd things to do the checking.
> > 
> > When I tried to erase the 1st block of each AGs[1], the test passed[2].
> > Is that what you talked as above?
> 
> Yep.  The version I wrote also uses xfs_db to see if there's a primary
> sb at block zero that can point us to other backup superblocks to zap,
> though if you want to send your version go ahead because I'll be busy
> for a while dealing with this iomap mess. :/

Yeah, I saw Dave talked about that with you, so I never thought you
was working on this small issue ;)

As this test failure brings much trouble to us(than you), I'd like to fix
it if you have more other works to do.

Thanks,
Zorro

> 
> --D
> 
> > Thanks,
> > Zorro
> > 
> > [1]
> > diff --git a/common/rc b/common/rc
> > index e0b087c1..19b7ab02 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
> >         for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
> >                 test -b $dev && $WIPEFS_PROG -a $dev
> >         done
> > +
> > +       if [ "$FSTYP" = "xfs" ];then
> > +               _try_wipe_scratch_xfs
> > +       fi
> >  }
> >  
> >  # Only run this on xfs if xfs_scrub is available and has the unicode checker
> > diff --git a/common/xfs b/common/xfs
> > index 1bce3c18..53f33d12 100644
> > --- a/common/xfs
> > +++ b/common/xfs
> > @@ -884,3 +884,24 @@ _xfs_mount_agcount()
> >  {
> >         $XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
> >  }
> > +
> > +_try_wipe_scratch_xfs()
> > +{
> > +       local tmp=`mktemp -u`
> > +
> > +       _scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> > +               if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> > +                       print STDOUT "agcount=$1\nagsize=$2\n";
> > +               }
> > +               if (/^data\s+=\s+bsize=(\d+)\s/) {
> > +                       print STDOUT "dbsize=$1\n";
> > +               }' > $tmp.mkfs
> > +       . $tmp.mkfs
> > +       if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> > +               for((i=0; i<agcount; i++)); do
> > +                       $XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> > +                               $SCRATCH_DEV >/dev/null;
> > +               done
> > +       fi
> > +       rm -f $tmp.mkfs
> > +}
> > 
> > [2]
> > # ./check xfs/030
> > FSTYP         -- xfs (non-debug)
> > PLATFORM      -- Linux/x86_64 xxx-xxxx-xx xxx-xxxx-xx-xxx
> > MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/scratchdev
> > MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/scratchdev /mnt/scratch
> > 
> > xfs/030 24s ...  25s
> > Ran: xfs/030
> > Passed all 1 tests
> > 
> > > 
> > > --D
> > > 
> > > > --D
> > > > 
> > > > > > Darrick, do you know more about that?
> > > > > > 
> > > > > > Thanks,
> > > > > > Zorro
> > > > > > 
> > > > > > > > xfs/148 is a clone of test 030 using xfs_prepair64 instead of xfs_repair.
> > > > > > > > xfs/149 is a clone of test 031 using xfs_prepair instead of xfs_repair
> > > > > > I'm not worried about it too much, due to it always 'not run' and never
> > > > > > failsYes. But I perfer to remove them because IMO they are useless.
> > > > > > 
> > > > > 
> > > > > > xfs/148 [not run] parallel repair binary xfs_prepair64 is not installed
> > > > > > xfs/149 [not run] parallel repair binary xfs_prepair is not installed
> > > > > > Ran: xfs/148 xfs/149
> > > > > > Not run: xfs/148 xfs/149
> > > > > > Passed all 2 tests
> > > > > > 
> > > > > 
> > > > >