Re: [PATCH v3] btrfs: add test case to verify the behavior with large RAID0 data chunks

Filipe Manana <fdmanana@xxxxxxxxxx> · Wed, 21 Jun 2023 09:51:47 +0100

On Wed, Jun 21, 2023 at 9:46 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>
> There is a recent regression during v6.4 merge window, that a u32 left
> shift overflow can cause problems with large data chunks (over 4G
> sized).
>
> This is the regression test case for it.
>
> The test case itself would:
>
> - Create a RAID0 chunk with a single 6G data chunk
>   This requires at least 6 devices from SCRATCH_DEV_POOL, and each
>   should be larger than 2G.
>
> - Fill the fs with 5G data
>
> - Make sure everything is fine
>   Including the data csums.
>
> - Delete half of the data
>
> - Do a fstrim
>   This would lead to out-of-boundary trim if the kernel is not patched.
>
> - Make sure everything is fine again
>   If not patched, we may have corrupted data due to the bad fstrim
>   above.
>
> For now, this test case only checks the behavior for RAID0.
> As for RAID10, we need 12 devices, which is out-of-reach for a lot of
> test envionrments.
>
> For RAID56, they would have a different test case, as they don't support
> discard inside the RAID56 chunks.
>
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>

Looks good now, thanks.

Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>

> ---
> Changelog:
> v2:
> - Add requirement for fstrim and batched discard support
> - Add some comments on why it's safe as long as each device is larger
>   than 2G
> - Use nodiscard mount option to increase the possibility of
>   crash/corruption
>   Newer kernel go with async discard by default and has extra trim cache
>   to avoid duplicated trim commands.
>   Disable such discard behavior so that fstrim can always trigger the
>   bug.
>
> v3:
> - Use the merged fix commit in _fixed_by_kernel_commit
> - Add the missing _scratch_dev_pool_put() calls before _fail()/_notrun()
> - Fix the spell and grammar of a comment
> - Update the error message if we detected a corruption after fstrim
> - Use $XFS_IO_PROG instead of direct xfs_io calls
> ---
>  tests/btrfs/292     | 91 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/292.out |  2 +
>  2 files changed, 93 insertions(+)
>  create mode 100755 tests/btrfs/292
>  create mode 100644 tests/btrfs/292.out
>
> diff --git a/tests/btrfs/292 b/tests/btrfs/292
> new file mode 100755
> index 00000000..32a7c3c5
> --- /dev/null
> +++ b/tests/btrfs/292
> @@ -0,0 +1,91 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 292
> +#
> +# Test btrfs behavior with large chunks (size beyond 4G) for basic read-write
> +# and discard.
> +# This test focus on RAID0.
> +#
> +. ./common/preamble
> +_begin_fstest auto raid volume trim
> +
> +. ./common/filter
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_require_scratch_dev_pool 6
> +_require_fstrim
> +_fixed_by_kernel_commit a7299a18a179 \
> +       "btrfs: fix u32 overflows when left shifting @stripe_nr"
> +
> +_scratch_dev_pool_get 6
> +
> +
> +datasize=$((5 * 1024 * 1024 * 1024))
> +filesize=$((8 * 1024 * 1024))
> +nr_files=$(($datasize / $filesize))
> +
> +# Make sure each device has at least 2G.
> +# Btrfs has a limits on per-device stripe length of 1G.
> +# Double that so that we can ensure a RAID0 data chunk with 6G size.
> +for i in $SCRATCH_DEV_POOL; do
> +       devsize=$(blockdev --getsize64 "$i")
> +       if [ $devsize -lt $((2 * 1024 * 1024 * 1024)) ]; then
> +               _scratch_dev_pool_put
> +               _notrun "device $i is too small, need at least 2G"
> +       fi
> +done
> +
> +_scratch_pool_mkfs -m raid1 -d raid0 >> $seqres.full 2>&1
> +
> +# We disable async/sync auto discard, so that btrfs won't try to
> +# cache the discard result which can cause unexpected skip for some trim range.
> +_scratch_mount -o nodiscard
> +_require_batched_discard $SCRATCH_MNT
> +
> +# Fill the data chunk with 5G data.
> +for (( i = 0; i < $nr_files; i++ )); do
> +       $XFS_IO_PROG -f -c "pwrite -i /dev/urandom 0 $filesize" \
> +               $SCRATCH_MNT/file_$i > /dev/null
> +done
> +sync
> +echo "=== With initial 5G data written ===" >> $seqres.full
> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
> +
> +_scratch_unmount
> +
> +# Make sure we haven't corrupted anything.
> +$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
> +if [ $? -ne 0 ]; then
> +       _scratch_dev_pool_put
> +       _fail "data corruption detected after initial data filling"
> +fi
> +
> +_scratch_mount -o nodiscard
> +# Delete half of the data, and do discard
> +rm -rf - "$SCRATCH_MNT/*[02468]"
> +sync
> +$FSTRIM_PROG $SCRATCH_MNT
> +
> +echo "=== With 2.5G data trimmed ===" >> $seqres.full
> +$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >> $seqres.full
> +_scratch_unmount
> +
> +# Make sure fstrim doesn't corrupt anything.
> +$BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1
> +if [ $? -ne 0 ]; then
> +       _scratch_dev_pool_put
> +       _fail "data corruption detected after running fstrim"
> +fi
> +
> +_scratch_dev_pool_put
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/292.out b/tests/btrfs/292.out
> new file mode 100644
> index 00000000..627309d3
> --- /dev/null
> +++ b/tests/btrfs/292.out
> @@ -0,0 +1,2 @@
> +QA output created by 292
> +Silence is golden
> --
> 2.39.0
>