Re: [PATCH 1/4] generic: require discard zero behavior for dmlogwrites on XFS

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 27 Aug 2020 09:58:09 +0300

On Wed, Aug 26, 2020 at 5:38 PM Brian Foster <bfoster@xxxxxxxxxx> wrote:
>
> Several generic fstests use dm-log-writes to test the filesystem for
> consistency at various crash recovery points. dm-log-writes and the
> associated replay mechanism rely on zeroing via discard to clear
> stale blocks when moving to various points in time of the fs. If the
> storage doesn't provide zeroing or the discard requests exceed the
> hardcoded maximum (128MB) of the fallback solution to physically
> write zeroes, stale blocks are left around in the target fs. This
> scheme is known to cause issues on XFS v5 superblocks if recovery
> observes metadata from a future variant of an fs that has been
> replayed to an older point in time. This corrupts the filesystem and
> leads to false test failures.
>
> generic/482 already works around this problem by using a thin volume
> as the target device, which provides consistent and efficient
> discard zeroing behavior, but other tests have seen similar issues
> on XFS. Add an XFS specific check to the dmlogwrites init time code
> that requires discard zeroing support and otherwise skips the test
> to avoid false positive failures.
>
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>  common/dmlogwrites | 10 ++++++++--
>  common/rc          | 14 ++++++++++++++
>  tests/generic/470  |  2 +-
>  3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/common/dmlogwrites b/common/dmlogwrites
> index 573f4b8a..92cc6ce2 100644
> --- a/common/dmlogwrites
> +++ b/common/dmlogwrites
> @@ -43,9 +43,10 @@ _require_log_writes_dax_mountopt()
>         _require_test_program "log-writes/replay-log"
>
>         local ret=0
> -       local mountopt=$1
> +       local dev=$1
> +       local mountopt=$2
>
> -       _log_writes_init $SCRATCH_DEV
> +       _log_writes_init $dev
>         _log_writes_mkfs > /dev/null 2>&1
>         _log_writes_mount "-o $mountopt" > /dev/null 2>&1
>         # Check options to be sure.
> @@ -66,6 +67,11 @@ _log_writes_init()
>         [ -z "$blkdev" ] && _fail \
>         "block dev must be specified for _log_writes_init"
>
> +       # XFS requires discard zeroing support on the target device to work
> +       # reliably with dm-log-writes. Use dm-thin devices in tests that want
> +       # to provide reliable discard zeroing support.
> +       [ $FSTYP == "xfs" ] && _require_discard_zeroes $blkdev
> +

I imagine that ext4 could also be burned by this.
Do we have a reason to limit this requirement to xfs?
I prefer to make it generic.

>         local BLK_DEV_SIZE=`blockdev --getsz $blkdev`
>         LOGWRITES_NAME=logwrites-test
>         LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
> diff --git a/common/rc b/common/rc
> index aa5a7409..fedb5221 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -4313,6 +4313,20 @@ _require_mknod()
>         rm -f $TEST_DIR/$seq.null
>  }
>
> +# check that discard is supported and subsequent reads return zeroes
> +_require_discard_zeroes()
> +{
> +       local dev=$1
> +
> +       _require_command "$BLKDISCARD_PROG" blkdiscard
> +
> +       $XFS_IO_PROG -c "pwrite -S 0xcd 0 4k" $dev > /dev/null 2>&1 ||
> +               _fail "write error"
> +       $BLKDISCARD_PROG -o 0 -l 1m $dev || _notrun "no discard support"
> +       hexdump -n 4096 $dev | head -n 1 | grep cdcd &&
> +               _notrun "no discard zeroing support"
> +}
> +

I am fine with your solution, but if there was a discussion on the best way to
solve the problem, I missed it, so would like to hear what Chritoph has to say.

Thanks,
Amir.