Re: [PATCH] btrfs: new test for logical inode resolution panic

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Dec 14, 2022 at 11:15 PM Boris Burkov <boris@xxxxxx> wrote:
>
> If we create a file that has an inline extent followed by a prealloc
> extent, then attempt to use the logical to inode ioctl on the prealloc
> extent, but in the overwritten range, backref resolution will process
> the inline extent. Depending on the leaf eb layout, this can panic.
> Add a new test for this condition. In the long run, we can add spew when
> we read out-of-bounds fields of inline extent items and simplify this
> test to look for dmesg warnings rather than trying to force a fairly
> fragile panic (dependent on non-standardized details of leaf layout).
>
> The test causes a kernel panic unless:
> btrfs: fix logical_ino ioctl panic
> is applied to the kernel.
>
> Signed-off-by: Boris Burkov <boris@xxxxxx>

So in addition to feedback already received from David and Zorro, I
have some comments inlined below.

> ---
>  tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/279.out |  2 +
>  2 files changed, 97 insertions(+)
>  create mode 100755 tests/btrfs/279
>  create mode 100644 tests/btrfs/279.out
>
> diff --git a/tests/btrfs/279 b/tests/btrfs/279
> new file mode 100755
> index 00000000..ef77f84b
> --- /dev/null
> +++ b/tests/btrfs/279
> @@ -0,0 +1,95 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
> +#
> +# FS QA Test 279
> +#
> +# Given a file with extents:
> +# [0 : 4096) (inline)
> +# [4096 : N] (prealloc)
> +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
> +# non-inline extent, it results in reading the offset field of the inline
> +# extent, which is meaningless (it is full of user data..). If we are
> +# particularly lucky, it can be past the end of the extent buffer, resulting in
> +# a crash.
> +#
> +. ./common/preamble
> +_begin_fstest auto quick
> +
> +# Import common functions.
> +. ./common/filter
> +. ./common/dmlogwrites
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_require_scratch
> +_require_xfs_io_command "falloc"

Here it should be:

_require_xfs_io_command "falloc" "-k"

> +_require_xfs_io_command "fsync"
> +_require_xfs_io_command "pwrite"

This is the first time I'm seeing a test requiring xfs_io to support
the pwrite and fsync commands.
Presumably there aren't any because either these commands always
existed or they have been around for a very long time.

> +
> +dump_tree() {
> +       $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV
> +}
> +
> +get_extent_data() {
> +       local ino=$1
> +       dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
> +}
> +
> +get_prealloc_offset() {
> +       local ino=$1
> +       get_extent_data $ino | grep "disk byte" | awk '{print $5}'

In fstests we use $AWK_PROG instead of bare 'awk'.

> +}
> +
> +# This test needs to create conditions s.t. the special inode's inline extent
> +# is the first item in a leaf. Therefore, fix a leaf size and add
> +# items that are otherwise not necessary to reproduce the inline-prealloc
> +# condition to get to such a state.
> +#
> +# Roughly, the idea for getting the right item fill is to:
> +# 1. create an extra file with a variable sized inline extent item
> +# 2. create our evil file that will cause the panic
> +# 3. create a whole bunch of files to create a bunch of dir/index items
> +# 4. size the variable extent item s.t. the evil extent item is item 0 in the
> +#    next leaf
> +#
> +# We do it in this somewhat convoluted way because the dir and index items all
> +# come before any inode, inode_ref, or extent_data items. So we can predictably
> +# create a bunch of them, then sneak in a funny sized extent to round out the
> +# difference.
> +
> +_scratch_mkfs "--nodesize 16k" >/dev/null

So this will fail on a machine with a 64K page size (PPC for e.g.)
using a kernel or btrfs-progs without subpage sector size support.
That will make mkfs output an error to stderr and cause the test to fail.

Please use a 64K node size and adapt the number of files below so that
we get the problematic leaf layout to trigger
the bug.

Like this the test will be able to run, and reproduce the bug on an
unpatched kernel, on machines with any page size
and on kernels without subpage sector size support (thinking of SLE
kernels for example).

That's generally what we do when we need to exercise a particular leaf
layout and allow the test to run on machines
with any page size. For example btrfs/239 and btrfs/154 do this.

> +_scratch_mount
> +
> +f=$SCRATCH_MNT/f
> +
> +# the variable extra "leaf padding" file
> +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
> +
> +# the evil file with an inline extent followed by a prealloc extent
> +# created by falloc with keep-size, then two non-truncating writes to the front
> +touch $f.evil
> +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +ino=$(stat -c '%i' $f.evil)
> +logical=$(get_prealloc_offset $ino)
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> +$XFS_IO_PROG -fc fsync $f.evil
> +sync

This is complex, and it doesn't provide any comments regarding:

1) Why all this frenzy of fsync and a final sync (which makes the last
fsync pointless)?

2) Why do we need to write twice to the range [0, 23)?

Wouldn't something more simple like this work too:

$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil

# sync to commit the transaction and make sure dump-tree sees the fs tree with
# the prealloc extent when we try to get its bytenr.
sync
ino=$(stat -c '%i' $f.evil)
logical=$(get_prealloc_offset $ino)

# Do a write that will result in an inline extent.
$XFS_IO_PROG -c "pwrite -q 0 23" $f.evil

# A bunch of inodes to stuff dir items in front of the file extent items.
for i in $(seq 122); do
     $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
done

# Flush all dealloc so we get all the file extent items inserted in the fs tree.
sync

Please add comments about each necessary step, as I've done there.
Otherwise it's very hard to understand why those steps are needed...

I'm surprised no one commented on this before.
I'm comfortable with btrfs' internals and yet I can't understand why
there are so many steps, and what exactly each one is supposed to
accomplish.

> +
> +# a bunch of inodes to stuff dir items in front of the extent items
> +for i in $(seq 122); do
> +       $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> +done
> +sync
> +
> +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
> +_scratch_unmount

The unmount is pointless, fstests framework will do that for us.

Finally I would also like to see the _fixed_by_kernel_commit
annotation in the test.
Since the fix is not yet in Linus' tree, in IMO you can replace the
commit hash with "xxxx..." and later update the test once it's merged
in Linus' tree.

Thanks.

> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
> new file mode 100644
> index 00000000..c5906093
> --- /dev/null
> +++ b/tests/btrfs/279.out
> @@ -0,0 +1,2 @@
> +QA output created by 279
> +Silence is golden
> --
> 2.38.1
>



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux