Re: [PATCH] btrfs: new test for logical inode resolution panic

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Fri, Dec 16, 2022 at 11:10:26AM +0000, Filipe Manana wrote:
> On Wed, Dec 14, 2022 at 11:15 PM Boris Burkov <boris@xxxxxx> wrote:
> >
> > If we create a file that has an inline extent followed by a prealloc
> > extent, then attempt to use the logical to inode ioctl on the prealloc
> > extent, but in the overwritten range, backref resolution will process
> > the inline extent. Depending on the leaf eb layout, this can panic.
> > Add a new test for this condition. In the long run, we can add spew when
> > we read out-of-bounds fields of inline extent items and simplify this
> > test to look for dmesg warnings rather than trying to force a fairly
> > fragile panic (dependent on non-standardized details of leaf layout).
> >
> > The test causes a kernel panic unless:
> > btrfs: fix logical_ino ioctl panic
> > is applied to the kernel.
> >
> > Signed-off-by: Boris Burkov <boris@xxxxxx>
> 
> So in addition to feedback already received from David and Zorro, I
> have some comments inlined below.
> 
> > ---
> >  tests/btrfs/279     | 95 +++++++++++++++++++++++++++++++++++++++++++++
> >  tests/btrfs/279.out |  2 +
> >  2 files changed, 97 insertions(+)
> >  create mode 100755 tests/btrfs/279
> >  create mode 100644 tests/btrfs/279.out
> >
> > diff --git a/tests/btrfs/279 b/tests/btrfs/279
> > new file mode 100755
> > index 00000000..ef77f84b
> > --- /dev/null
> > +++ b/tests/btrfs/279
> > @@ -0,0 +1,95 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 Meta Platforms, Inc.  All Rights Reserved.
> > +#
> > +# FS QA Test 279
> > +#
> > +# Given a file with extents:
> > +# [0 : 4096) (inline)
> > +# [4096 : N] (prealloc)
> > +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the
> > +# non-inline extent, it results in reading the offset field of the inline
> > +# extent, which is meaningless (it is full of user data..). If we are
> > +# particularly lucky, it can be past the end of the extent buffer, resulting in
> > +# a crash.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto quick
> > +
> > +# Import common functions.
> > +. ./common/filter
> > +. ./common/dmlogwrites
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs btrfs
> > +_require_scratch
> > +_require_xfs_io_command "falloc"
> 
> Here it should be:
> 
> _require_xfs_io_command "falloc" "-k"
> 
> > +_require_xfs_io_command "fsync"
> > +_require_xfs_io_command "pwrite"
> 
> This is the first time I'm seeing a test requiring xfs_io to support
> the pwrite and fsync commands.
> Presumably there aren't any because either these commands always
> existed or they have been around for a very long time.
> 
> > +
> > +dump_tree() {
> > +       $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV
> > +}
> > +
> > +get_extent_data() {
> > +       local ino=$1
> > +       dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA "
> > +}
> > +
> > +get_prealloc_offset() {
> > +       local ino=$1
> > +       get_extent_data $ino | grep "disk byte" | awk '{print $5}'
> 
> In fstests we use $AWK_PROG instead of bare 'awk'.
> 
> > +}
> > +
> > +# This test needs to create conditions s.t. the special inode's inline extent
> > +# is the first item in a leaf. Therefore, fix a leaf size and add
> > +# items that are otherwise not necessary to reproduce the inline-prealloc
> > +# condition to get to such a state.
> > +#
> > +# Roughly, the idea for getting the right item fill is to:
> > +# 1. create an extra file with a variable sized inline extent item
> > +# 2. create our evil file that will cause the panic
> > +# 3. create a whole bunch of files to create a bunch of dir/index items
> > +# 4. size the variable extent item s.t. the evil extent item is item 0 in the
> > +#    next leaf
> > +#
> > +# We do it in this somewhat convoluted way because the dir and index items all
> > +# come before any inode, inode_ref, or extent_data items. So we can predictably
> > +# create a bunch of them, then sneak in a funny sized extent to round out the
> > +# difference.
> > +
> > +_scratch_mkfs "--nodesize 16k" >/dev/null
> 
> So this will fail on a machine with a 64K page size (PPC for e.g.)
> using a kernel or btrfs-progs without subpage sector size support.
> That will make mkfs output an error to stderr and cause the test to fail.
> 
> Please use a 64K node size and adapt the number of files below so that
> we get the problematic leaf layout to trigger
> the bug.
> 
> Like this the test will be able to run, and reproduce the bug on an
> unpatched kernel, on machines with any page size
> and on kernels without subpage sector size support (thinking of SLE
> kernels for example).
> 
> That's generally what we do when we need to exercise a particular leaf
> layout and allow the test to run on machines
> with any page size. For example btrfs/239 and btrfs/154 do this.

Good point, I'll re-do it for 64k to make it fully general as that
was my intent in setting a fixed nodesize.

> 
> > +_scratch_mount
> > +
> > +f=$SCRATCH_MNT/f
> > +
> > +# the variable extra "leaf padding" file
> > +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad
> > +
> > +# the evil file with an inline extent followed by a prealloc extent
> > +# created by falloc with keep-size, then two non-truncating writes to the front
> > +touch $f.evil
> > +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> > +$XFS_IO_PROG -fc fsync $f.evil
> > +ino=$(stat -c '%i' $f.evil)
> > +logical=$(get_prealloc_offset $ino)
> > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> > +$XFS_IO_PROG -fc fsync $f.evil
> > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil
> > +$XFS_IO_PROG -fc fsync $f.evil
> > +sync
> 
> This is complex, and it doesn't provide any comments regarding:
> 
> 1) Why all this frenzy of fsync and a final sync (which makes the last
> fsync pointless)?
> 
> 2) Why do we need to write twice to the range [0, 23)?

Honestly, I'm not sure. A single write results in a regular extent while
two (separated by an fsync) results in an inline extent. Since we don't
consider the inline extent a bug (which I inferred from I think your
previous discussion with Qu for a similar case), I didn't prioritize
digging into that behavior while working on this fix/test.

> 
> Wouldn't something more simple like this work too:
> 
> $XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil
> 
> # sync to commit the transaction and make sure dump-tree sees the fs tree with
> # the prealloc extent when we try to get its bytenr.
> sync
> ino=$(stat -c '%i' $f.evil)
> logical=$(get_prealloc_offset $ino)
> 
> # Do a write that will result in an inline extent.
> $XFS_IO_PROG -c "pwrite -q 0 23" $f.evil
> 
> # A bunch of inodes to stuff dir items in front of the file extent items.
> for i in $(seq 122); do
>      $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> done
> 
> # Flush all dealloc so we get all the file extent items inserted in the fs tree.
> sync
> 
> Please add comments about each necessary step, as I've done there.
> Otherwise it's very hard to understand why those steps are needed...
> 
> I'm surprised no one commented on this before.
> I'm comfortable with btrfs' internals and yet I can't understand why
> there are so many steps, and what exactly each one is supposed to
> accomplish.

I like fstests having a literate style, so I'm happy to do this, but I
must point out that it's far from universal in the repo...

> 
> > +
> > +# a bunch of inodes to stuff dir items in front of the extent items
> > +for i in $(seq 122); do
> > +       $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i
> > +done
> > +sync
> > +
> > +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch
> > +_scratch_unmount
> 
> The unmount is pointless, fstests framework will do that for us.
> 
> Finally I would also like to see the _fixed_by_kernel_commit
> annotation in the test.
> Since the fix is not yet in Linus' tree, in IMO you can replace the
> commit hash with "xxxx..." and later update the test once it's merged
> in Linus' tree.
> 
> Thanks.

Thanks for the review.
> 
> > +
> > +echo "Silence is golden"
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out
> > new file mode 100644
> > index 00000000..c5906093
> > --- /dev/null
> > +++ b/tests/btrfs/279.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 279
> > +Silence is golden
> > --
> > 2.38.1
> >



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux