Re: [PATCH v2] fstests: btrfs: a new test case to verify a use-after-free bug

Filipe Manana <fdmanana@xxxxxxxxxx> · Mon, 26 Aug 2024 13:45:15 +0100

On Mon, Aug 26, 2024 at 1:14 PM Filipe Manana <fdmanana@xxxxxxxxxx> wrote:
>
> On Sat, Aug 24, 2024 at 11:31 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
> >
> > [BUG]
> > There is a use-after-free bug triggered very randomly by btrfs/125.
> >
> > If KASAN is enabled it can be triggered on certain setup.
> > Or it can lead to crash.
> >
> > [CAUSE]
> > The test case btrfs/125 is using RAID5 for metadata, which has a known
> > RMW problem if the there is some corruption on-disk.
> >
> > RMW will use the corrupted contents to generate a new parity, losing the
> > final chance to rebuild the contents.
> >
> > This is specific to metadata, as for data we have extra data checksum,
> > but the metadata has extra problems like possible deadlock due to the
> > extra metadata read/recovery needed to search the extent tree.
> >
> > And we know this problem for a while but without a better solution other
> > than avoid using RAID56 for metadata:
> >
> > >   Metadata
> > >       Do not use raid5 nor raid6 for metadata. Use raid1 or raid1c3
> > >       respectively.
> >
> > Combined with the above csum tree corruption, since RAID5 is stripe
> > based, btrfs needs to split its read bios according to stripe boundary.
> > And after a split, do a csum tree lookup for the expected csum.
>
> This is way too much unrelated stuff.
> The problem may have been triggered sporadically by btrfs/125, but
> there's no need to go into details on raid5 problems in btrfs.
> Even because the use-after-free bug can be triggered without raid5,
> just using raid0 like in the test case introduced by this patch.
>
> >
> > But if that csum lookup failed, in the error path btrfs doesn't handle
> > the split bios properly and lead to double freeing of the original bio
> > (the one containing the bio vectors).
> >
> > [NEW TEST CASE]
> > Unlike the original btrfs/125, which is very random and picky to
> > reproduce, introduce a new test case to verify the specific behavior by:
> >
> > - Create a btrfs with enough csum leaves
> >   To bump the csum tree level, use the minimal nodesize possible (4K).
> >   Writing 32M data which needs at least 8 leaves for data checksum
> >
> > - Find the last csum tree leave and corrupt it
> >
> > - Read the data many times until we trigger the bug or exit gracefully
> >   With an x86_64 VM (which is never able to trigger btrfs/125 failure)
> >   with KASAN enabled, it can trigger the KASAN report in just 4
> >   iterations (the default iteration number is 32).
> >
> > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> > ---
> > Changelog:
> > v2:
> > - Fix the wrong commit hash
> >   The proper fix is not yet merged, the old hash is a place holder
> >   copied from another test case but forgot to remove.
> >
> > - Minor wording update
> >
> > - Add to "dangerous" group
> > ---
> >  tests/btrfs/319     | 84 +++++++++++++++++++++++++++++++++++++++++++++
> >  tests/btrfs/319.out |  2 ++
> >  2 files changed, 86 insertions(+)
> >  create mode 100755 tests/btrfs/319
> >  create mode 100644 tests/btrfs/319.out
> >
> > diff --git a/tests/btrfs/319 b/tests/btrfs/319
> > new file mode 100755
> > index 00000000..4be2b50b
> > --- /dev/null
> > +++ b/tests/btrfs/319
>
> There's already a btrfs/319 test case in for-next btw. This needs to
> be renumbered.
>
> > @@ -0,0 +1,84 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
> > +#
> > +# FS QA Test 319
> > +#
> > +# Make sure data csum lookup failure will not lead to double bio freeing
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick dangerous
>
> Missing the "raid" group.
>
> > +
> > +_require_scratch

Also this is not needed, because below we do a call to:
_require_scratch_nocheck

Finally the subject could be more specific like for example:

"btrfs: test reading data with a corrupted checksum tree leaf"

Since the test serves to check there are no use-after-free, crashes,
deadlocks, etc, when reading data which has its checksums in a
corrupted csum tree block.

Thanks.

> > +_fixed_by_kernel_commit xxxxxxxxxxxx \
> > +       "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()"
> > +
> > +# The final fs will have a corrupted csum tree, which will never pass fsck
> > +_require_scratch_nocheck
> > +_require_scratch_dev_pool 2
> > +
> > +# Use RAID0 for data to get bio splitted according to stripe boundary.
> > +# This is required to trigger the bug.
> > +_check_btrfs_raid_type raid0
> > +
> > +# This test goes 4K sectorsize and 4K nodesize, so that we easily create
> > +# higher level of csum tree.
> > +_require_btrfs_support_sectorsize 4096
> > +
> > +# The bug itself has a race window, run this many times to ensure triggering.
> > +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run.
> > +runtime=32
>
> This is a confusing name because it actually means iterations in the
> for loop below, and not a time duration.
> I would suggest renaming it to "iterations" for example, or just get
> rid of it since it's only used in the for loop's condition.
>
> > +
> > +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1
> > +# This test requires data checksum to trigger the bug.
> > +_scratch_mount -o datasum,datacow
> > +
> > +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data
> > +# will need 32K data checksum at minimal, which is at least 8 leaves.
> > +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null
> > +sync
>
> What's this sync for?
> We just do a unmount right after it, which makes it pointless and confusing.
>
> > +_scratch_unmount
> > +
> > +# Search for the last leaf of the csum tree, that will be the target to destroy.
> > +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full
>
> Please use the full command name: inspect -> inspect-internal
>
> > +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ )
>
> Same here.
>
> Also, missing at the top a:
>
> _require_btrfs_command inspect-internal dump-tree
>
> Also we're passing the symbolic name "csum" to -t, which not all
> versions of btrfs-progs support.
> The support was added in btrfs-progs 4.5 (commit
> 69874af7b81519e40db9d92efa6beebee4220c63).
>
> > +
> > +if [ -z "$target_bytenr" ]; then
> > +       _fail "unable to locate the last csum tree leave"
>
> leave -> leaf
>
> > +fi
> > +
> > +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full
>
> leave -> leaf
>
> > +
> > +# Corrupt that csum tree block.
> > +physical=$(_btrfs_get_physical "$target_bytenr" 1)
> > +dev=$(_btrfs_get_device_path "$target_bytenr" 1)
> > +
> > +echo "physical bytenr: $physical" >> $seqres.full
> > +echo "physical device: $dev" >> $seqres.full
> > +
> > +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null
> > +
> > +for (( i = 0; i < $runtime; i++ )); do
> > +       echo "=== run $i/$runtime ===" >> $seqres.full
> > +       _scratch_mount -o ro
> > +       # Since the data is on RAID0, read bios will be split at the stripe
> > +       # (64K sized) boundary. If csum lookup failed due to corrupted csum
> > +       # tree, there is a race window that can lead to double bio freeing
> > +       # (triggering KASAN at least).
> > +       cat "$SCRATCH_MNT/foobar" &> /dev/null
> > +       _scratch_unmount
> > +
> > +       # Manually check the dmesg for "BUG", and do not call _check_dmesg()
> > +       # as it will clear 'check_dmesg' file and skips the final check after
> > +       # the test.
> > +       # For now just focus on the "BUG:" line from KASAN.
> > +       if _check_dmesg_for "BUG" ; then
> > +               _fail "Critical error(s) found in dmesg"
> > +       fi
>
> Why do the check manually? The check script calls _check_dmesg when a
> test finishes and if it finds 'BUG:' there, it will make the test
> fail.
> So there's no need to do the unmount and call _check_dmesg_for.
>
> Thanks.
>
> > +done
> > +
> > +echo "Silence is golden"
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out
> > new file mode 100644
> > index 00000000..d40c929a
> > --- /dev/null
> > +++ b/tests/btrfs/319.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 319
> > +Silence is golden
> > --
> > 2.46.0
> >
> >