Re: question of xfs/148 and xfs/149

Zorro Lang <zlang@xxxxxxxxxx> · Thu, 19 Sep 2019 13:20:33 +0800

On Wed, Sep 18, 2019 at 04:10:50PM -0700, Darrick J. Wong wrote:
> On Wed, Sep 18, 2019 at 09:37:11AM -0700, Darrick J. Wong wrote:
> > On Wed, Sep 18, 2019 at 11:24:47AM +0800, Yang Xu wrote:
> > > 
> > > 
> > > on 2019/09/18 10:59, Zorro Lang wrote:
> > > > xfs/030 is weird, I've found it long time ago.
> > > > 
> > > > If I do a 'whole disk mkfs' (_scratch_mkfs_xfs), before this sized mkfs:
> > > > 
> > > >    _scratch_mkfs_xfs $DSIZE >/dev/null 2>&1
> > > > 
> > > > Everything looks clear, and test pass. I can't send a patch to do this,
> > > > because I don't know the reason.
> > > Yes. I also found running _scratch_mkfs_xfs in xfs/030 can slove this
> > > problem yesterday. Or, we can adjust _try_wipe_scratch_devs order in
> > > check(But I dont't have enough reason to explain why adjust it). as below:
> > 
> > (Yeah, I don't see any obvious reason why that would change outcomes...)
> > 
> > > --- a/check
> > > +++ b/check
> > > @@ -753,7 +753,6 @@ for section in $HOST_OPTIONS_SECTIONS; do
> > >                         # _check_dmesg depends on this log in dmesg
> > >                         touch ${RESULT_DIR}/check_dmesg
> > >                 fi
> > > -               _try_wipe_scratch_devs > /dev/null 2>&1
> > >                 if [ "$DUMP_OUTPUT" = true ]; then
> > >                         _run_seq 2>&1 | tee $tmp.out
> > >                         # Because $? would get tee's return code
> > > @@ -799,7 +798,7 @@ for section in $HOST_OPTIONS_SECTIONS; do
> > >                 # Scan for memory leaks after every test so that associating
> > >                 # a leak to a particular test will be as accurate as
> > > possible.
> > >                 _check_kmemleak || err=true
> > > -
> > > +               _try_wipe_scratch_devs > /dev/null 2>&1
> > >                 # test ends after all checks are done.
> > >                 $timestamp && _timestamp
> > >                 stop=`_wallclock`
> > > 
> > > > 
> > > > I'm not familiar with xfs_repair so much, so I don't know what happens
> > > > underlying. I suppose the the part after the $DSIZE affect the xfs_repair,
> > > > but I don't know why the wipefs can cause that, wipefs only erase 4 bytes
> > > > at the beginning.
> > > > 
> > >  I am finding the reasion. It seems wipefs wipes important information and
> > > $DSIZE option(using single agcount or dsize, it also fails ) can not format
> > > disk completely. If we use other options, it can pass.
> > 
> > How does mkfs fail, specifically?
> > 
> > Also, what's your storage configuration?  And lsblk -D output?
> 
> I'm still interested in the answer to these questions, but I've done a
> little more research and noticed that yes, xfs/030 fails if the device
> doesn't support zeroing discard.
> 
> First, if mkfs.xfs detects an old primary superblock, it will write
> zeroes to all superblocks before formatting the new filesystem.
> Obviously this won't be done if the device doesn't have a primary
> superblock.
> 
> (1) So let's say that a previous test formatted a 4GB scratch disk with
> all defaults, and let's say that we have 4 AGs.  The disk will look like
> this:
> 
>   SB0 [1G space] SB1 [1G space] SB2 [1G space] SB3 [1G space]
> 
> (2) Now we _try_wipe_scratch_devs, which wipes out the primary label:
> 
>   000 [1G space] SB1 [1G space] SB2 [1G space] SB3 [1G space]
> 
> (3) Now xfs/030 runs its special mkfs command (6AGs, 100MB disk).  If the
> disk supports zeroing discard, it will discard the whole device:
> 
>   <4GB of zeroes>
> 
> (4) Then it will lay down its own filesystem:
> 
>   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 4G>
> 
> (5) Next, xfs/030 zaps the primary superblock:
> 
>   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 4G>
> 
> (6) Next, xfs/030 runs xfs_repair.  It fails to find the primary sb, so it
> tries to find secondary superblocks.  Its first strategy is to compute
> the fs geometry assuming all default options.  In this case, that means
> 4 AGs, spaced 1G apart.  They're all zero, so it falls back to a linear
> scan of the disk.  It finds SB1, uses that to rewrite the primary super,
> and continues with the repair (which is mostly uneventful).  The test
> passes; this is why it works on my computer.
> 
> ---------
> 
> Now let's see what happened before _try_wipe_scratch_devs.  In step (3)
> mkfs would find the old superblocks and wipe the superblocks, before
> laying down the new superblocks:
> 
>   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
>       000 [1G space] 000 [1G space] 000 [1G space]
> 
> Step (5) zaps the primary, yielding:
> 
>   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
>       000 [1G space] 000 [1G space] 000 [1G space]
> 
> Step (6) fails to find a primary superblock so it tries to read backup
> superblocks at 1G, 2G, and 3G, but they're all zero so it falls back to
> the linear scan and picks up SB1 and proceeds with a mostly uneventful
> repair.  The test passes.
> 
> ---------
> 
> However, with _try_wipe_scratch_devs and a device that doesn't support
> discard (or MKFS_OPTIONS includes -K), we have a problem.  mkfs.xfs
> doesn't discard the device nor does it find a primary superblock, so it
> simply formats the new filesystem.  We end up with:
> 
>   SB0 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
>       SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]
> 
> Where SB[0-5] are from the filesystem that xfs/030 formatted but
> SB'[1-3] are from the filesystem that was on the scratch disk before
> xfs/030 even started.  Uhoh.
> 
> Step (5) zaps the primary, yielding:
> 
>   000 [16M zeroes] SB1 [16M zeroes] <4 more AGs> <zeroes from 100M to 1G> \
>       SB'1 [1G space] SB'2 [1G space] SB'3 [1G space]
> 
> Step (6) fails to find a primary superblock so it tries to read backup
> superblocks at 1G.  It finds SB'1 and uses that to reconstruct the /old/
> filesystem, with what looks like massive filesystem damage.  This
> results in test failure.  Oops.
> 
> ----------
> 
> The reason for adding _try_wipe_scratch_devs was to detect broken tests
> that started using the filesystem on the scratch device (if any) before
> (or without!) formatting the scratch device.  That broken behavior could
> result in spurious test failures when xfstests was run in random order
> mode either due to mounting an unformatted device or mounting a corrupt
> fs that some other test left behind.
> 
> I guess a fix for XFS would be have _try_wipe_scratch_devs try to read
> the primary superblock to compute the AG geometry and then erase all
> superblocks that could be on the disk; and then compute the default
> geometry and wipe out all those superblocks too.
> 
> Does any of that square with what you've been seeing?

Thanks Darrick, so what I supposed might be true?
"
  > > > > I'm not familiar with xfs_repair so much, so I don't know what happens
  > > > > underlying. I suppose the the part after the $DSIZE affect the xfs_repair,
"

The sized mkfs.xfs (without discard) leave old on-disk structure behind $DSIZE
space, it cause xfs_repair try to use odd things to do the checking.

When I tried to erase the 1st block of each AGs[1], the test passed[2].
Is that what you talked as above?

Thanks,
Zorro

[1]

diff --git a/common/rc b/common/rc
index e0b087c1..19b7ab02 100644
--- a/common/rc
+++ b/common/rc
@@ -4048,6 +4048,10 @@ _try_wipe_scratch_devs()
        for dev in $SCRATCH_DEV_POOL $SCRATCH_DEV $SCRATCH_LOGDEV $SCRATCH_RTDEV; do
                test -b $dev && $WIPEFS_PROG -a $dev
        done
+
+       if [ "$FSTYP" = "xfs" ];then
+               _try_wipe_scratch_xfs
+       fi
 }
 
 # Only run this on xfs if xfs_scrub is available and has the unicode checker
diff --git a/common/xfs b/common/xfs
index 1bce3c18..53f33d12 100644
--- a/common/xfs
+++ b/common/xfs
@@ -884,3 +884,24 @@ _xfs_mount_agcount()
 {
        $XFS_INFO_PROG "$1" | grep agcount= | sed -e 's/^.*agcount=\([0-9]*\),.*$/\1/g'
 }
+
+_try_wipe_scratch_xfs()
+{
+       local tmp=`mktemp -u`
+
+       _scratch_mkfs_xfs -N 2>/dev/null | perl -ne '
+               if (/^meta-data=.*\s+agcount=(\d+), agsize=(\d+) blks/) {
+                       print STDOUT "agcount=$1\nagsize=$2\n";
+               }
+               if (/^data\s+=\s+bsize=(\d+)\s/) {
+                       print STDOUT "dbsize=$1\n";
+               }' > $tmp.mkfs
+       . $tmp.mkfs
+       if [ -n "$agcount" -a -n "$agsize" -a -n "$dbsize" ];then
+               for((i=0; i<agcount; i++)); do
+                       $XFS_IO_PROG -c "pwrite $((i * dbsize * agsize)) $dbsize" \
+                               $SCRATCH_DEV >/dev/null;
+               done
+       fi
+       rm -f $tmp.mkfs
+}

[2]
# ./check xfs/030
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 xxx-xxxx-xx xxx-xxxx-xx-xxx
MKFS_OPTIONS  -- -f -bsize=4096 /dev/mapper/scratchdev
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/mapper/scratchdev /mnt/scratch

xfs/030 24s ...  25s
Ran: xfs/030
Passed all 1 tests

> 
> --D
> 
> > --D
> > 
> > > > Darrick, do you know more about that?
> > > > 
> > > > Thanks,
> > > > Zorro
> > > > 
> > > > > > xfs/148 is a clone of test 030 using xfs_prepair64 instead of xfs_repair.
> > > > > > xfs/149 is a clone of test 031 using xfs_prepair instead of xfs_repair
> > > > I'm not worried about it too much, due to it always 'not run' and never
> > > > failsYes. But I perfer to remove them because IMO they are useless.
> > > > 
> > > 
> > > > xfs/148 [not run] parallel repair binary xfs_prepair64 is not installed
> > > > xfs/149 [not run] parallel repair binary xfs_prepair is not installed
> > > > Ran: xfs/148 xfs/149
> > > > Not run: xfs/148 xfs/149
> > > > Passed all 2 tests
> > > > 
> > > 
> > >