On Fri, Dec 16, 2022 at 11:10:26AM +0000, Filipe Manana wrote: > On Wed, Dec 14, 2022 at 11:15 PM Boris Burkov <boris@xxxxxx> wrote: > > > > If we create a file that has an inline extent followed by a prealloc > > extent, then attempt to use the logical to inode ioctl on the prealloc > > extent, but in the overwritten range, backref resolution will process > > the inline extent. Depending on the leaf eb layout, this can panic. > > Add a new test for this condition. In the long run, we can add spew when > > we read out-of-bounds fields of inline extent items and simplify this > > test to look for dmesg warnings rather than trying to force a fairly > > fragile panic (dependent on non-standardized details of leaf layout). > > > > The test causes a kernel panic unless: > > btrfs: fix logical_ino ioctl panic > > is applied to the kernel. > > > > Signed-off-by: Boris Burkov <boris@xxxxxx> > > So in addition to feedback already received from David and Zorro, I > have some comments inlined below. > > > --- > > tests/btrfs/279 | 95 +++++++++++++++++++++++++++++++++++++++++++++ > > tests/btrfs/279.out | 2 + > > 2 files changed, 97 insertions(+) > > create mode 100755 tests/btrfs/279 > > create mode 100644 tests/btrfs/279.out > > > > diff --git a/tests/btrfs/279 b/tests/btrfs/279 > > new file mode 100755 > > index 00000000..ef77f84b > > --- /dev/null > > +++ b/tests/btrfs/279 > > @@ -0,0 +1,95 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2022 Meta Platforms, Inc. All Rights Reserved. > > +# > > +# FS QA Test 279 > > +# > > +# Given a file with extents: > > +# [0 : 4096) (inline) > > +# [4096 : N] (prealloc) > > +# if a user uses the ioctl BTRFS_IOC_LOGICAL_INO[_V2] asking for the file of the > > +# non-inline extent, it results in reading the offset field of the inline > > +# extent, which is meaningless (it is full of user data..). If we are > > +# particularly lucky, it can be past the end of the extent buffer, resulting in > > +# a crash. > > +# > > +. ./common/preamble > > +_begin_fstest auto quick > > + > > +# Import common functions. > > +. ./common/filter > > +. ./common/dmlogwrites > > + > > +# real QA test starts here > > + > > +# Modify as appropriate. > > +_supported_fs btrfs > > +_require_scratch > > +_require_xfs_io_command "falloc" > > Here it should be: > > _require_xfs_io_command "falloc" "-k" > > > +_require_xfs_io_command "fsync" > > +_require_xfs_io_command "pwrite" > > This is the first time I'm seeing a test requiring xfs_io to support > the pwrite and fsync commands. > Presumably there aren't any because either these commands always > existed or they have been around for a very long time. > > > + > > +dump_tree() { > > + $BTRFS_UTIL_PROG inspect-internal dump-tree $SCRATCH_DEV > > +} > > + > > +get_extent_data() { > > + local ino=$1 > > + dump_tree $SCRATCH_DEV | grep -A4 "($ino EXTENT_DATA " > > +} > > + > > +get_prealloc_offset() { > > + local ino=$1 > > + get_extent_data $ino | grep "disk byte" | awk '{print $5}' > > In fstests we use $AWK_PROG instead of bare 'awk'. > > > +} > > + > > +# This test needs to create conditions s.t. the special inode's inline extent > > +# is the first item in a leaf. Therefore, fix a leaf size and add > > +# items that are otherwise not necessary to reproduce the inline-prealloc > > +# condition to get to such a state. > > +# > > +# Roughly, the idea for getting the right item fill is to: > > +# 1. create an extra file with a variable sized inline extent item > > +# 2. create our evil file that will cause the panic > > +# 3. create a whole bunch of files to create a bunch of dir/index items > > +# 4. size the variable extent item s.t. the evil extent item is item 0 in the > > +# next leaf > > +# > > +# We do it in this somewhat convoluted way because the dir and index items all > > +# come before any inode, inode_ref, or extent_data items. So we can predictably > > +# create a bunch of them, then sneak in a funny sized extent to round out the > > +# difference. > > + > > +_scratch_mkfs "--nodesize 16k" >/dev/null > > So this will fail on a machine with a 64K page size (PPC for e.g.) > using a kernel or btrfs-progs without subpage sector size support. > That will make mkfs output an error to stderr and cause the test to fail. > > Please use a 64K node size and adapt the number of files below so that > we get the problematic leaf layout to trigger > the bug. > > Like this the test will be able to run, and reproduce the bug on an > unpatched kernel, on machines with any page size > and on kernels without subpage sector size support (thinking of SLE > kernels for example). > > That's generally what we do when we need to exercise a particular leaf > layout and allow the test to run on machines > with any page size. For example btrfs/239 and btrfs/154 do this. Good point, I'll re-do it for 64k to make it fully general as that was my intent in setting a fixed nodesize. > > > +_scratch_mount > > + > > +f=$SCRATCH_MNT/f > > + > > +# the variable extra "leaf padding" file > > +$XFS_IO_PROG -fc "pwrite -q 0 700" $f.pad > > + > > +# the evil file with an inline extent followed by a prealloc extent > > +# created by falloc with keep-size, then two non-truncating writes to the front > > +touch $f.evil > > +$XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil > > +$XFS_IO_PROG -fc fsync $f.evil > > +ino=$(stat -c '%i' $f.evil) > > +logical=$(get_prealloc_offset $ino) > > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > > +$XFS_IO_PROG -fc fsync $f.evil > > +$XFS_IO_PROG -fc "pwrite -q 0 23" $f.evil > > +$XFS_IO_PROG -fc fsync $f.evil > > +sync > > This is complex, and it doesn't provide any comments regarding: > > 1) Why all this frenzy of fsync and a final sync (which makes the last > fsync pointless)? > > 2) Why do we need to write twice to the range [0, 23)? Honestly, I'm not sure. A single write results in a regular extent while two (separated by an fsync) results in an inline extent. Since we don't consider the inline extent a bug (which I inferred from I think your previous discussion with Qu for a similar case), I didn't prioritize digging into that behavior while working on this fix/test. > > Wouldn't something more simple like this work too: > > $XFS_IO_PROG -fc "falloc -k 0 1m" $f.evil > > # sync to commit the transaction and make sure dump-tree sees the fs tree with > # the prealloc extent when we try to get its bytenr. > sync > ino=$(stat -c '%i' $f.evil) > logical=$(get_prealloc_offset $ino) > > # Do a write that will result in an inline extent. > $XFS_IO_PROG -c "pwrite -q 0 23" $f.evil > > # A bunch of inodes to stuff dir items in front of the file extent items. > for i in $(seq 122); do > $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i > done > > # Flush all dealloc so we get all the file extent items inserted in the fs tree. > sync > > Please add comments about each necessary step, as I've done there. > Otherwise it's very hard to understand why those steps are needed... > > I'm surprised no one commented on this before. > I'm comfortable with btrfs' internals and yet I can't understand why > there are so many steps, and what exactly each one is supposed to > accomplish. I like fstests having a literate style, so I'm happy to do this, but I must point out that it's far from universal in the repo... > > > + > > +# a bunch of inodes to stuff dir items in front of the extent items > > +for i in $(seq 122); do > > + $XFS_IO_PROG -fc "pwrite -q 0 8192" $f.$i > > +done > > +sync > > + > > +btrfs inspect-internal logical-resolve $logical $SCRATCH_MNT | _filter_scratch > > +_scratch_unmount > > The unmount is pointless, fstests framework will do that for us. > > Finally I would also like to see the _fixed_by_kernel_commit > annotation in the test. > Since the fix is not yet in Linus' tree, in IMO you can replace the > commit hash with "xxxx..." and later update the test once it's merged > in Linus' tree. > > Thanks. Thanks for the review. > > > + > > +echo "Silence is golden" > > +status=0 > > +exit > > diff --git a/tests/btrfs/279.out b/tests/btrfs/279.out > > new file mode 100644 > > index 00000000..c5906093 > > --- /dev/null > > +++ b/tests/btrfs/279.out > > @@ -0,0 +1,2 @@ > > +QA output created by 279 > > +Silence is golden > > -- > > 2.38.1 > >