On Mon, Apr 27, 2020 at 5:07 PM Arvind Raghavan <raghavan.arvind@xxxxxxxxx> wrote: > > Thanks for the feedback! > > On Sun, Apr 26, 2020 at 2:29 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Fri, Apr 24, 2020 at 7:52 PM Arvind Raghavan > > <raghavan.arvind@xxxxxxxxx> wrote: > > > > > > This patch aims to add tests to the xfstest suite to check whether > > > renamed files recover after a crash. These test cases were generated by > > > CrashMonkey, a crash-consistency testing framework built at the SASLab > > > at UT Austin. > > > > > > This patch batches 37 tests into a single test, each of which creates a > > > file, renames it, syncs one of the files/parent directories, and ensures > > > that the synced file/dir is equivalent before and after a crash. > > > > > > This test takes around 12-15 seconds to run locally and is thus placed > > > in the 'quick' group. > > > > > > Signed-off-by: Arvind Raghavan <raghavan.arvind@xxxxxxxxx> > > > > Nice work. Some things to improve... > > > > > --- > > > tests/generic/484 | 179 ++++++++++++++++++++++++++++++++++++++++++ > > > tests/generic/484.out | 2 + > > > tests/generic/group | 1 + > > > 3 files changed, 182 insertions(+) > > > create mode 100755 tests/generic/484 > > > create mode 100644 tests/generic/484.out > > > > > > diff --git a/tests/generic/484 b/tests/generic/484 > > > new file mode 100755 > > > index 00000000..b27d828d > > > --- /dev/null > > > +++ b/tests/generic/484 > > > @@ -0,0 +1,179 @@ > > > +#! /bin/bash > > > +# SPDX-License-Identifier: GPL-2.0 > > > +# Copyright (c) 2020 The University of Texas at Austin. All Rights Reserved. > > > +# > > > +# FS QA Test 484 > > > +# > > > +# Test case created by CrashMonkey > > > +# > > > +# Test if we create a rename a file and persist it that the > > > +# appropriate files exist. > > > +# > > > +seq=`basename $0` > > > +seqres=$RESULT_DIR/$seq > > > +echo "QA output created by $seq" > > > + > > > +here=`pwd` > > > +tmp=/tmp/$$ > > > +status=1 # failure is the default! > > > +trap "_cleanup; exit \$status" 0 1 2 3 15 > > > + > > > +_cleanup() > > > +{ > > > + _cleanup_flakey > > > + cd / > > > + rm -f $tmp.* > > > +} > > > + > > > +# get standard environment, filters and checks > > > +. ./common/rc > > > +. ./common/filter > > > +. ./common/dmflakey > > > + > > > +# 256MB in byte > > > +fssize=$((2**20 * 256)) > > > + > > > +# remove previous $seqres.full before test > > > +rm -f $seqres.full > > > + > > > +# real QA test starts here > > > +_supported_fs generic > > > +_supported_os Linux > > > +_require_scratch_nocheck > > > +_require_dm_target flakey > > > + > > > +# initialize scratch device > > > +_scratch_mkfs_sized $fssize >> $seqres.full 2>&1 > > > > Why limit the fs size? > > In general this results is test coverage for non real life filesystems, > > so sub-optimal IMO. > > It was an artifact from the originally Crashmonkey test harness, but > we can go ahead and remove it now. > > > > +_require_metadata_journaling $SCRATCH_DEV > > > +_init_flakey > > > + > > > +stat_opt='-c %n - blocks: %b size: %s inode: %i links: %h' > > > + > > > +# Usage: general_stat [--data-only] file1 [--data-only] [file2] .. > > > +# If the --data-only flag precedes a file, then only that file's > > > +# data will be printed. If a file is synced, then general_stat > > > +# should provide identical output before and after a crash. > > > > This helper seems overly complicated for the use case in this > > test, but it is local to the test. I understand it is auto generated for > > other tests but that's not the way to go. > > I suggest you look into using existing FSSUM_PROG tool. > > I agree that seems like a more elegant solution. We essentially > have four cases that we need to check: > > (1) fsync file (meta + data) > (2) fdatasync file > (3) fsync dir > (4) fdatasync dir > > FSSUM_PROG works great for (3) and (4), since it looks like its > intended to be used only on directories. md5sum works well for > (2), but it doesn't capture the metadata needed for (1). > FSSUM_PROG has a way to print a full "manifest", which contains > the metadata/data hashes for each file, so I believe we can > grep/cut it out of the manifest output to test for (1). I'll send > out a V2 patch if you think this seems like a reasonable > approach. > > Also, see below for a note on FSSUM_PROG. Upon looking more closely into fssum, I've changed my mind. It looks like fssum walks a directory recursively, adding all file names, metadata, and data into a checksum. This is useful for a filesytem with stronger guarantees, but under POSIX standards, the only guarantees for fsycing a dir are that its immediate dirents are correct and that its metadata is synced, not necessarily the metadata of its children. Because of this, I believe it would be more readable and concise to rewrite this function instead of grep/cutting the output of fssum. I can clean it up by removing the echos and case/shifting and turn it into something that concisely expresses the following: fsync file -> stat file, md5sum file fdatasync file -> md5sum file fsync dir -> stat dir, ls dir fdatasync dir -> ls dir Would those changes make more sense? > > > +general_stat() { > > > + local data_only="false" > > > + while (( "$#" )); do > > > + case $1 in > > > + --data-only) > > > + data_only="true" > > > + ;; > > > + *) > > > + local path="$1" > > > + echo "-- $path --" > > > + if [ ! -e "$path" ]; then > > > + echo "Doesn't exist!" > > > + elif [ "$data_only" = "true" ] && [ -d "$path" ]; then > > > + echo "Directory Data" > > > + [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort > > > > [ -z "$(ls -A $path)" ] || seems to add nothing. > > > > > + elif [ "$data_only" = "true" ]; then > > > + echo "File Data" > > > + od "$path" > > > + elif [ -d "$path" ]; then > > > + echo "Directory Metadata" > > > + stat "$stat_opt" "$path" > > > + echo "Directory Data" > > > + [ -z "$(ls -A $path)" ] || ls -1 "$path" | sort > > > + else > > > + echo "File Metadata" > > > + stat "$stat_opt" "$path" > > > + echo "File Data" > > > + od "$path" > > > + fi > > > + data_only="false" > > > + ;; > > > + esac > > > + shift > > > + done > > > +} > > > + > > > +check_consistency() > > > +{ > > > + before=$(general_stat "$@") > > > + _flakey_drop_and_remount > > > + after=$(general_stat "$@") > > > + > > > + if [ "$before" != "$after" ]; then > > > + echo -e "Before:\n$before" > > > + echo -e "After:\n$after" > > > + fi > > > + > > > + # Using _unmount_flakey nondeterministically fails here with > > > + # "target is busy". Calling 'sync' doesn't fix the issue. Lazy > > > + # unmount waits for lingering processes to exit. > > > > No way. None of the other flaky tests have this kludge. > > You must be doing something wrong, or there is another bug > > in the infrastructure that needs to be fixed instead of being papered over. > > Yeah, this was a bug on my end. I've fixed it and removed all of > the lazy unmounts. > > > > + $UMOUNT_PROG -l $SCRATCH_MNT > > > + _check_scratch_fs $FLAKEY_DEV > > > +} > > > + > > > +clean_dir() > > > +{ > > > + _mount_flakey > > > + rm -rf $(find $SCRATCH_MNT/* | grep -v "lost+found") > > > > Use testdir=$SCRATCH_MNT/testdir for base of your test and you > > won't need this blacklisting of lost+found. > > > > > + $UMOUNT_PROG -l $SCRATCH_MNT > > > > And maybe this is what you are doing wrong?? > > Why is this unmount lazy? > > > > > +} > > > + > > > +do_fsync_check() { > > > + local file="$1" > > > + local sync_op="$2" > > > + > > > + if [ $sync_op == "fdatasync" ]; then > > > + $XFS_IO_PROG -c "fdatasync" $file > > > > You don't really need to put this statement inside if/else > > > > > + check_consistency --data-only $file > > > > And you could probably also pass sync_op to check_consistency() > > instead of converting it here. In other words, this helper does not > > do much, so you could get rid of it. up to you. > > > > Thanks, > > Amir. > > I believe there is a minor bug in the fssum program. Below are > the relevant lines from the help output. > > usage: fssum <options> <path> > options: > ... > -[ugoamcdxe]: specify which fields to include in checksum > calculation. > ... > x : include xattrs > ... > ... > -x path : exclude path when building checksum > (multiple ok) > ... > > There seems to be a duplicate flag which prevents the user from > passing in -x to include the xattrs field in checksum > calculation. This can be fixed with a simple patch that changes > the latter flag to "-i path", and updates the documentation to > say "ignores path when building checksum". I've got such a commit > built locally than I can send out in a patch if you think it > makes sense. None of the tests in the repo use the flag as AFAIK, > but we will end up using it for the xattr tests.