Re: question of xfs/148 and xfs/149

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Wed, 18 Sep 2019 23:33:12 -0700

On Thu, Sep 19, 2019 at 01:20:33PM +0800, Zorro Lang wrote:
> On Wed, Sep 18, 2019 at 04:10:50PM -0700, Darrick J. Wong wrote:
> > On Wed, Sep 18, 2019 at 09:37:11AM -0700, Darrick J. Wong wrote:
> > > On Wed, Sep 18, 2019 at 11:24:47AM +0800, Yang Xu wrote:
> > > > 
> > > > 
> > > > on 2019/09/18 10:59, Zorro Lang wrote:
> > > > > xfs/030 is weird, I've found it long time ago.
> > > > > 
> > > > > If I do a 'whole disk mkfs' (_scratch_mkfs_xfs), before this sized mkfs:
> > > > > 
> > > > >    _scratch_mkfs_xfs $DSIZE >/dev/null 2>&1
> > > > > 
> > > > > Everything looks clear, and test pass. I can't send a patch to do this,
> > > > > because I don't know the reason.
> > > > Yes. I also found running _scratch_mkfs_xfs in xfs/030 can slove this
> > > > problem yesterday. Or, we can adjust _try_wipe_scratch_devs order in
> > > > check(But I dont't have enough reason to explain why adjust it). as below:
> > > 
> > > (Yeah, I don't see any obvious reason why that would change outcomes...)
> > > 
> > > > --- a/check
> > > > +++ b/check
> > > > @@ -753,7 +753,6 @@ for section in $HOST_OPTIONS_SECTIONS; do
> > > >                         # _check_dmesg depends on this log in dmesg
> > > >                         touch ${RESULT_DIR}/check_dmesg
> > > >                 fi
> > > > -               _try_wipe_scratch_devs > /dev/null 2>&1
> > > >                 if [ "$DUMP_OUTPUT" = true ]; then
> > > >                         _run_seq 2>&1 | tee $tmp.out
> > > >                         # Because $? would get tee's return code
> > > > @@ -799,7 +798,7 @@ for section in $HOST_OPTIONS_SECTIONS; do
> > > >                 # Scan for memory leaks after every test so that associating
> > > >                 # a leak to a particular test will be as accurate as
> > > > possible.
> > > >                 _check_kmemleak || err=true
> > > > -
> > > > +               _try_wipe_scratch_devs > /dev/null 2>&1
> > > >                 # test ends after all checks are done.
> > > >                 $timestamp && _timestamp
> > > >                 stop=`_wallclock`
> > > > 
> > > > > 
> > > > > I'm not familiar with xfs_repair so much, so I don't know what happens
> > > > > underlying. I suppose the the part after the $DSIZE affect the xfs_repair,
> > > > > but I don't know why the wipefs can cause that, wipefs only erase 4 bytes
> > > > > at the beginning.
> > > > > 
> > > >  I am finding the reasion. It seems wipefs wipes important information and
> > > > $DSIZE option(using single agcount or dsize, it also fails ) can not format
> > > > disk completely. If we use other options, it can pass.
> > > 
> > > How does mkfs fail, specifically?
> > > 
> > > Also, what's your storage configuration?  And lsblk -D output?
> > 
> > I'm still interested in the answer to these questions, but I've done a
> > little more research and noticed that yes, xfs/030 fails if the device
> > doesn't support zeroing discard.
> > 
> > First, if mkfs.xfs detects an old primary superblock, it will write
> > zeroes to all superblocks before formatting the new filesystem.
> > Obviously this won't be done if the device doesn't have a primary
> > superblock.
> > 
> > (1) So let's say that a previous test formatted a 4GB scratch disk with
> > all defaults, and let's say that we have 4 AGs.  The disk will look like
> > this:
> > 
> >   SB0 [1G space] SB1 [1G space] SB2 [1G space] SB3 [1G space]
> > 
> > (2) Now we _try_wipe_scratch_devs, which wipes out the primary label:
> > 
> >   000 [1G space] SB1 [1G space] SB2 [1G space] SB3 [1G space]
> > 
> > (3) Now xfs/030 runs its special mkfs command (6AGs, 100MB disk).  If the
> > disk supports zeroing discard, it will discard the whole device:
> > 
> >   <4GB of zeroes>
> > 
> > (4) Then it will lay down its own filesystem:
> > 
> >   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 4G>
> > 
> > (5) Next, xfs/030 zaps the primary superblock:
> > 
> >   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 4G>
> > 
> > (6) Next, xfs/030 runs xfs_repair.  It fails to find the primary sb, so it
> > tries to find secondary superblocks.  Its first strategy is to compute
> > the fs geometry assuming all default options.  In this case, that means
> > 4 AGs, spaced 1G apart.  They're all zero, so it falls back to a linear
> > scan of the disk.  It finds SB1, uses that to rewrite the primary super,
> > and continues with the repair (which is mostly uneventful).  The test
> > passes; this is why it works on my computer.
> > 
> > ---------
> > 
> > Now let's see what happened before _try_wipe_scratch_devs.  In step (3)
> > mkfs would find the old superblocks and wipe the superblocks, before
> > laying down the new superblocks:
> > 
> >   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> >       000 [1G space] 000 [1G space] 000 [1G space]
> > 
> > Step (5) zaps the primary, yielding:
> > 
> >   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> >       000 [1G space] 000 [1G space] 000 [1G space]
> > 
> > Step (6) fails to find a primary superblock so it tries to read backup
> > superblocks at 1G, 2G, and 3G, but they're all zero so it falls back to
> > the linear scan and picks up SB1 and proceeds with a mostly uneventful
> > repair.  The test passes.
> > 
> > ---------
> > 
> > However, with _try_wipe_scratch_devs and a device that doesn't support
> > discard (or MKFS_OPTIONS includes -K), we have a problem.  mkfs.xfs
> > doesn't discard the device nor does it find a primary superblock, so it
> > simply formats the new filesystem.  We end up with:
> > 
> >   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> >       SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]
> > 
> > Where SB[0-5] are from the filesystem that xfs/030 formatted but
> > SB'[1-3] are from the filesystem that was on the scratch disk before
> > xfs/030 even started.  Uhoh.
> > 
> > Step (5) zaps the primary, yielding:
> > 
> >   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
> >       SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]
> > 
> > Step (6) fails to find a primary superblock so it tries to read backup
> > superblocks at 1G.  It finds SB'1 and uses that to reconstruct the /old/
> > filesystem, with what looks like massive filesystem damage.  This
> > results in test failure.  Oops.
> > 
> > ----------
> > 
> > The reason for adding _try_wipe_scratch_devs was to detect broken tests
> > that started using the filesystem on the scratch device (if any) before
> > (or without!) formatting the scratch device.  That broken behavior could
> > result in spurious test failures when xfstests was run in random order
> > mode either due to mounting an unformatted device or mounting a corrupt
> > fs that some other test left behind.
> > 
> > I guess a fix for XFS would be have _try_wipe_scratch_devs try to read
> > the primary superblock to compute the AG geometry and then erase all
> > superblocks that could be on the disk; and then compute the default
> > geometry and wipe out all those superblocks too.
> > 
> > Does any of that square with what you've been seeing?
> 
> Thanks Darrick, so what I supposed might be true?
> "
>   > > > > I'm not familiar with xfs_repair so much, so I don't know what happens
>   > > > > underlying. I suppose the the part after the $DSIZE affect the xfs_repair,
> "
> 
> The sized mkfs.xfs (without discard) leave old on-disk structure behind $DSIZE
> space, it cause xfs_repair try to use odd things to do the checking.
> 
> When I tried to erase the 1st block of each AGs[1], the test passed[2].
> Is that what you talked as above?

Yep.  The version I wrote also uses xfs_db to see if there's a primary
sb at block zero that can point us to other backup superblocks to zap,
though if you want to send your version go ahead because I'll be busy
for a while dealing with this iomap mess. :/

--D

> Thanks,
> Zorro
> 
> [1]
> diff --git a/common/rc b/common/rc
> index e0b087c1..19b7ab02 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
>         for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
>                 test -b $dev && $WIPEFS_PROG -a $dev
>         done
> +
> +       if [ "$FSTYP" = "xfs" ];then
> +               _try_wipe_scratch_xfs
> +       fi
>  }
>  
>  # Only run this on xfs if xfs_scrub is available and has the unicode checker
> diff --git a/common/xfs b/common/xfs
> index 1bce3c18..53f33d12 100644
> --- a/common/xfs
> +++ b/common/xfs
> @@ -884,3 +884,24 @@ _xfs_mount_agcount()
>  {
>         $XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
>  }
> +
> +_try_wipe_scratch_xfs()
> +{
> +       local tmp=`mktemp -u`
> +
> +       _scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
> +               if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
> +                       print STDOUT "agcount=$1\nagsize=$2\n";
> +               }
> +               if (/^data\s+=\s+bsize=(\d+)\s/) {
> +                       print STDOUT "dbsize=$1\n";
> +               }' > $tmp.mkfs
> +       . $tmp.mkfs
> +       if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
> +               for((i=0; i<agcount; i++)); do
> +                       $XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
> +                               $SCRATCH_DEV >/dev/null;
> +               done
> +       fi
> +       rm -f $tmp.mkfs
> +}
> 
> [2]
> # ./check xfs/030
> FSTYP         -- xfs (non-debug)
> PLATFORM      -- Linux/x86_64 xxx-xxxx-xx xxx-xxxx-xx-xxx
> MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/scratchdev
> MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/scratchdev /mnt/scratch
> 
> xfs/030 24s ...  25s
> Ran: xfs/030
> Passed all 1 tests
> 
> > 
> > --D
> > 
> > > --D
> > > 
> > > > > Darrick, do you know more about that?
> > > > > 
> > > > > Thanks,
> > > > > Zorro
> > > > > 
> > > > > > > xfs/148 is a clone of test 030 using xfs_prepair64 instead of xfs_repair.
> > > > > > > xfs/149 is a clone of test 031 using xfs_prepair instead of xfs_repair
> > > > > I'm not worried about it too much, due to it always 'not run' and never
> > > > > failsYes. But I perfer to remove them because IMO they are useless.
> > > > > 
> > > > 
> > > > > xfs/148 [not run] parallel repair binary xfs_prepair64 is not installed
> > > > > xfs/149 [not run] parallel repair binary xfs_prepair is not installed
> > > > > Ran: xfs/148 xfs/149
> > > > > Not run: xfs/148 xfs/149
> > > > > Passed all 2 tests
> > > > > 
> > > > 
> > > >