Re: [PATCH] fstest: Crashmonkey rename tests ported to xfstest

Arvind Raghavan <raghavan.arvind@xxxxxxxxx> · Mon, 27 Apr 2020 18:06:56 -0500

On Mon, Apr 27, 2020 at 5:07 PM Arvind Raghavan
<raghavan.arvind@xxxxxxxxx> wrote:
>
> Thanks for the feedback!
>
> On Sun, Apr 26, 2020 at 2:29 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Fri, Apr 24, 2020 at 7:52 PM Arvind Raghavan
> > <raghavan.arvind@xxxxxxxxx> wrote:
> > >
> > > This patch aims to add tests to the xfstest suite to check whether
> > > renamed files recover after a crash. These test cases were generated by
> > > CrashMonkey, a crash-consistency testing framework built at the SASLab
> > > at UT Austin.
> > >
> > > This patch batches 37 tests into a single test, each of which creates a
> > > file, renames it, syncs one of the files/parent directories, and ensures
> > > that the synced file/dir is equivalent before and after a crash.
> > >
> > > This test takes around 12-15 seconds to run locally and is thus placed
> > > in the 'quick' group.
> > >
> > > Signed-off-by: Arvind Raghavan <raghavan.arvind@xxxxxxxxx>
> >
> > Nice work. Some things to improve...
> >
> > > ---
> > >  tests/generic/484     | 179 ++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/484.out |   2 +
> > >  tests/generic/group   |   1 +
> > >  3 files changed, 182 insertions(+)
> > >  create mode 100755 tests/generic/484
> > >  create mode 100644 tests/generic/484.out
> > >
> > > diff --git a/tests/generic/484 b/tests/generic/484
> > > new file mode 100755
> > > index 00000000..b27d828d
> > > --- /dev/null
> > > +++ b/tests/generic/484
> > > @@ -0,0 +1,179 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2020 The University of Texas at Austin.  All Rights Reserved.
> > > +#
> > > +# FS QA Test 484
> > > +#
> > > +# Test case created by CrashMonkey
> > > +#
> > > +# Test if we create a rename a file and persist it that the
> > > +# appropriate files exist.
> > > +#
> > > +seq=`basename $0`
> > > +seqres=$RESULT_DIR/$seq
> > > +echo "QA output created by $seq"
> > > +
> > > +here=`pwd`
> > > +tmp=/tmp/$$
> > > +status=1       # failure is the default!
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +_cleanup()
> > > +{
> > > +       _cleanup_flakey
> > > +       cd /
> > > +       rm -f $tmp.*
> > > +}
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common/rc
> > > +. ./common/filter
> > > +. ./common/dmflakey
> > > +
> > > +# 256MB in byte
> > > +fssize=$((2**20 * 256))
> > > +
> > > +# remove previous $seqres.full before test
> > > +rm -f $seqres.full
> > > +
> > > +# real QA test starts here
> > > +_supported_fs generic
> > > +_supported_os Linux
> > > +_require_scratch_nocheck
> > > +_require_dm_target flakey
> > > +
> > > +# initialize scratch device
> > > +_scratch_mkfs_sized $fssize >> $seqres.full 2>&1
> >
> > Why limit the fs size?
> > In general this results is test coverage for non real life filesystems,
> > so sub-optimal IMO.
>
> It was an artifact from the originally Crashmonkey test harness, but
> we can go ahead and remove it now.
>
> > > +_require_metadata_journaling $SCRATCH_DEV
> > > +_init_flakey
> > > +
> > > +stat_opt='-c  %n - blocks: %b size: %s inode: %i links: %h'
> > > +
> > > +# Usage: general_stat [--data-only] file1 [--data-only] [file2] ..
> > > +# If the --data-only flag precedes a file, then only that file's
> > > +# data will be printed. If a file is synced, then general_stat
> > > +# should provide identical output before and after a crash.
> >
> > This helper seems overly complicated for the use case in this
> > test, but it is local to the test. I understand it is auto generated for
> > other tests but that's not the way to go.
> > I suggest you look into using existing FSSUM_PROG tool.
>
> I agree that seems like a more elegant solution. We essentially
> have four cases that we need to check:
>
> (1) fsync file (meta + data)
> (2) fdatasync file
> (3) fsync dir
> (4) fdatasync dir
>
> FSSUM_PROG works great for (3) and (4), since it looks like its
> intended to be used only on directories. md5sum works well for
> (2), but it doesn't capture the metadata needed for (1).
> FSSUM_PROG has a way to print a full "manifest", which contains
> the metadata/data hashes for each file, so I believe we can
> grep/cut it out of the manifest output to test for (1). I'll send
> out a V2 patch if you think this seems like a reasonable
> approach.
>
> Also, see below for a note on FSSUM_PROG.

Upon looking more closely into fssum, I've changed my mind. It
looks like fssum walks a directory recursively, adding all file
names, metadata, and data into a checksum. This is useful for a
filesytem with stronger guarantees, but under POSIX standards,
the only guarantees for fsycing a dir are that its immediate
dirents are correct and that its metadata is synced, not
necessarily the metadata of its children.

Because of this, I believe it would be more readable and concise
to rewrite this function instead of grep/cutting the output of
fssum. I can clean it up by removing the echos and case/shifting
and turn it into something that concisely expresses the
following:

fsync file     -> stat file, md5sum file
fdatasync file -> md5sum file
fsync dir      -> stat dir, ls dir
fdatasync dir  -> ls dir

Would those changes make more sense?

> > > +general_stat() {
> > > +       local data_only="false"
> > > +       while (( "$#" )); do
> > > +               case $1 in
> > > +                       --data-only)
> > > +                               data_only="true"
> > > +                               ;;
> > > +                       *)
> > > +                               local path="$1"
> > > +                               echo "-- $path --"
> > > +                               if [ ! -e "$path" ]; then
> > > +                                       echo "Doesn't exist!"
> > > +                               elif [ "$data_only" = "true" ] && [ -d "$path" ]; then
> > > +                                       echo "Directory Data"
> > > +                                       [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort
> >
> > [ -z "$(ls -A $path)" ] || seems to add nothing.
> >
> > > +                               elif [ "$data_only" = "true" ]; then
> > > +                                       echo "File Data"
> > > +                                       od "$path"
> > > +                               elif [ -d "$path" ]; then
> > > +                                       echo "Directory Metadata"
> > > +                                       stat "$stat_opt" "$path"
> > > +                                       echo "Directory Data"
> > > +                                       [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort
> > > +                               else
> > > +                                       echo "File Metadata"
> > > +                                       stat "$stat_opt" "$path"
> > > +                                       echo "File Data"
> > > +                                       od "$path"
> > > +                               fi
> > > +                               data_only="false"
> > > +                               ;;
> > > +               esac
> > > +               shift
> > > +       done
> > > +}
> > > +
> > > +check_consistency()
> > > +{
> > > +       before=$(general_stat "$@")
> > > +       _flakey_drop_and_remount
> > > +       after=$(general_stat "$@")
> > > +
> > > +       if [ "$before" != "$after" ]; then
> > > +               echo -e "Before:\n$before"
> > > +               echo -e "After:\n$after"
> > > +       fi
> > > +
> > > +       # Using _unmount_flakey nondeterministically fails here with
> > > +       # "target is busy". Calling 'sync' doesn't fix the issue. Lazy
> > > +       # unmount waits for lingering processes to exit.
> >
> > No way. None of the other flaky tests have this kludge.
> > You must be doing something wrong, or there is another bug
> > in the infrastructure that needs to be fixed instead of being papered over.
>
> Yeah, this was a bug on my end. I've fixed it and removed all of
> the lazy unmounts.
>
> > > +       $UMOUNT_PROG -l $SCRATCH_MNT
> > > +       _check_scratch_fs $FLAKEY_DEV
> > > +}
> > > +
> > > +clean_dir()
> > > +{
> > > +       _mount_flakey
> > > +       rm -rf $(find $SCRATCH_MNT/* | grep -v "lost+found")
> >
> > Use testdir=$SCRATCH_MNT/testdir for base of your test and you
> > won't need this blacklisting of lost+found.
> >
> > > +       $UMOUNT_PROG -l $SCRATCH_MNT
> >
> > And maybe this is what you are doing wrong??
> > Why is this unmount lazy?
> >
> > > +}
> > > +
> > > +do_fsync_check() {
> > > +       local file="$1"
> > > +       local sync_op="$2"
> > > +
> > > +       if [ $sync_op == "fdatasync" ]; then
> > > +               $XFS_IO_PROG -c "fdatasync" $file
> >
> > You don't really need to put this statement inside if/else
> >
> > > +               check_consistency --data-only $file
> >
> > And you could probably also pass sync_op to check_consistency()
> > instead of converting it here. In other words, this helper does not
> > do much, so you could get rid of it. up to you.
> >
> > Thanks,
> > Amir.
>
> I believe there is a minor bug in the fssum program. Below are
> the relevant lines from the help output.
>
> usage: fssum <options> <path>
>     options:
>     ...
>     -[ugoamcdxe]: specify which fields to include in checksum
>                   calculation.
>          ...
>          x      : include xattrs
>          ...
>     ...
>     -x path     : exclude path when building checksum
>                   (multiple ok)
>     ...
>
> There seems to be a duplicate flag which prevents the user from
> passing in -x to include the xattrs field in checksum
> calculation. This can be fixed with a simple patch that changes
> the latter flag to "-i path", and updates the documentation to
> say "ignores path when building checksum". I've got such a commit
> built locally than I can send out in a patch if you think it
> makes sense. None of the tests in the repo use the flag as AFAIK,
> but we will end up using it for the xattr tests.