On Mon, Aug 26, 2024 at 1:14 PM Filipe Manana <fdmanana@xxxxxxxxxx> wrote: > > On Sat, Aug 24, 2024 at 11:31 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > > > > [BUG] > > There is a use-after-free bug triggered very randomly by btrfs/125. > > > > If KASAN is enabled it can be triggered on certain setup. > > Or it can lead to crash. > > > > [CAUSE] > > The test case btrfs/125 is using RAID5 for metadata, which has a known > > RMW problem if the there is some corruption on-disk. > > > > RMW will use the corrupted contents to generate a new parity, losing the > > final chance to rebuild the contents. > > > > This is specific to metadata, as for data we have extra data checksum, > > but the metadata has extra problems like possible deadlock due to the > > extra metadata read/recovery needed to search the extent tree. > > > > And we know this problem for a while but without a better solution other > > than avoid using RAID56 for metadata: > > > > > Metadata > > > Do not use raid5 nor raid6 for metadata. Use raid1 or raid1c3 > > > respectively. > > > > Combined with the above csum tree corruption, since RAID5 is stripe > > based, btrfs needs to split its read bios according to stripe boundary. > > And after a split, do a csum tree lookup for the expected csum. > > This is way too much unrelated stuff. > The problem may have been triggered sporadically by btrfs/125, but > there's no need to go into details on raid5 problems in btrfs. > Even because the use-after-free bug can be triggered without raid5, > just using raid0 like in the test case introduced by this patch. > > > > > But if that csum lookup failed, in the error path btrfs doesn't handle > > the split bios properly and lead to double freeing of the original bio > > (the one containing the bio vectors). > > > > [NEW TEST CASE] > > Unlike the original btrfs/125, which is very random and picky to > > reproduce, introduce a new test case to verify the specific behavior by: > > > > - Create a btrfs with enough csum leaves > > To bump the csum tree level, use the minimal nodesize possible (4K). > > Writing 32M data which needs at least 8 leaves for data checksum > > > > - Find the last csum tree leave and corrupt it > > > > - Read the data many times until we trigger the bug or exit gracefully > > With an x86_64 VM (which is never able to trigger btrfs/125 failure) > > with KASAN enabled, it can trigger the KASAN report in just 4 > > iterations (the default iteration number is 32). > > > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > > --- > > Changelog: > > v2: > > - Fix the wrong commit hash > > The proper fix is not yet merged, the old hash is a place holder > > copied from another test case but forgot to remove. > > > > - Minor wording update > > > > - Add to "dangerous" group > > --- > > tests/btrfs/319 | 84 +++++++++++++++++++++++++++++++++++++++++++++ > > tests/btrfs/319.out | 2 ++ > > 2 files changed, 86 insertions(+) > > create mode 100755 tests/btrfs/319 > > create mode 100644 tests/btrfs/319.out > > > > diff --git a/tests/btrfs/319 b/tests/btrfs/319 > > new file mode 100755 > > index 00000000..4be2b50b > > --- /dev/null > > +++ b/tests/btrfs/319 > > There's already a btrfs/319 test case in for-next btw. This needs to > be renumbered. > > > @@ -0,0 +1,84 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > > +# > > +# FS QA Test 319 > > +# > > +# Make sure data csum lookup failure will not lead to double bio freeing > > +# > > +. ./common/preamble > > +_begin_fstest auto quick dangerous > > Missing the "raid" group. > > > + > > +_require_scratch Also this is not needed, because below we do a call to: _require_scratch_nocheck Finally the subject could be more specific like for example: "btrfs: test reading data with a corrupted checksum tree leaf" Since the test serves to check there are no use-after-free, crashes, deadlocks, etc, when reading data which has its checksums in a corrupted csum tree block. Thanks. > > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > > + "btrfs: fix a use-after-free bug when hitting errors inside btrfs_submit_chunk()" > > + > > +# The final fs will have a corrupted csum tree, which will never pass fsck > > +_require_scratch_nocheck > > +_require_scratch_dev_pool 2 > > + > > +# Use RAID0 for data to get bio splitted according to stripe boundary. > > +# This is required to trigger the bug. > > +_check_btrfs_raid_type raid0 > > + > > +# This test goes 4K sectorsize and 4K nodesize, so that we easily create > > +# higher level of csum tree. > > +_require_btrfs_support_sectorsize 4096 > > + > > +# The bug itself has a race window, run this many times to ensure triggering. > > +# On an x86_64 VM with KASAN enabled, normally it is triggered before the 10th run. > > +runtime=32 > > This is a confusing name because it actually means iterations in the > for loop below, and not a time duration. > I would suggest renaming it to "iterations" for example, or just get > rid of it since it's only used in the for loop's condition. > > > + > > +_scratch_pool_mkfs "-d raid0 -m single -n 4k -s 4k" >> $seqres.full 2>&1 > > +# This test requires data checksum to trigger the bug. > > +_scratch_mount -o datasum,datacow > > + > > +# For the smallest csum size (CRC32C) it's 4 bytes per 4K, writing 32M data > > +# will need 32K data checksum at minimal, which is at least 8 leaves. > > +_pwrite_byte 0xef 0 32m "$SCRATCH_MNT/foobar" > /dev/null > > +sync > > What's this sync for? > We just do a unmount right after it, which makes it pointless and confusing. > > > +_scratch_unmount > > + > > +# Search for the last leaf of the csum tree, that will be the target to destroy. > > +$BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV >> $seqres.full > > Please use the full command name: inspect -> inspect-internal > > > +target_bytenr=$($BTRFS_UTIL_PROG inspect dump-tree -t csum $SCRATCH_DEV | grep "leaf.*flags" | sort | tail -n1 | cut -f2 -d\ ) > > Same here. > > Also, missing at the top a: > > _require_btrfs_command inspect-internal dump-tree > > Also we're passing the symbolic name "csum" to -t, which not all > versions of btrfs-progs support. > The support was added in btrfs-progs 4.5 (commit > 69874af7b81519e40db9d92efa6beebee4220c63). > > > + > > +if [ -z "$target_bytenr" ]; then > > + _fail "unable to locate the last csum tree leave" > > leave -> leaf > > > +fi > > + > > +echo "bytenr of csum tree leave to corrupt: $target_bytenr" >> $seqres.full > > leave -> leaf > > > + > > +# Corrupt that csum tree block. > > +physical=$(_btrfs_get_physical "$target_bytenr" 1) > > +dev=$(_btrfs_get_device_path "$target_bytenr" 1) > > + > > +echo "physical bytenr: $physical" >> $seqres.full > > +echo "physical device: $dev" >> $seqres.full > > + > > +_pwrite_byte 0x00 "$physical" 4 "$dev" > /dev/null > > + > > +for (( i = 0; i < $runtime; i++ )); do > > + echo "=== run $i/$runtime ===" >> $seqres.full > > + _scratch_mount -o ro > > + # Since the data is on RAID0, read bios will be split at the stripe > > + # (64K sized) boundary. If csum lookup failed due to corrupted csum > > + # tree, there is a race window that can lead to double bio freeing > > + # (triggering KASAN at least). > > + cat "$SCRATCH_MNT/foobar" &> /dev/null > > + _scratch_unmount > > + > > + # Manually check the dmesg for "BUG", and do not call _check_dmesg() > > + # as it will clear 'check_dmesg' file and skips the final check after > > + # the test. > > + # For now just focus on the "BUG:" line from KASAN. > > + if _check_dmesg_for "BUG" ; then > > + _fail "Critical error(s) found in dmesg" > > + fi > > Why do the check manually? The check script calls _check_dmesg when a > test finishes and if it finds 'BUG:' there, it will make the test > fail. > So there's no need to do the unmount and call _check_dmesg_for. > > Thanks. > > > +done > > + > > +echo "Silence is golden" > > + > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/btrfs/319.out b/tests/btrfs/319.out > > new file mode 100644 > > index 00000000..d40c929a > > --- /dev/null > > +++ b/tests/btrfs/319.out > > @@ -0,0 +1,2 @@ > > +QA output created by 319 > > +Silence is golden > > -- > > 2.46.0 > > > >