Re: [PATCH v2] btrfs/311: new test for devt change between mounts

Filipe Manana <fdmanana@xxxxxxxxxx> · Fri, 8 Mar 2024 11:55:22 +0000

On Thu, Mar 7, 2024 at 8:48 PM Boris Burkov <boris@xxxxxx> wrote:
>
> On Tue, Mar 05, 2024 at 12:13:07PM +0000, Filipe Manana wrote:
> > On Fri, Mar 1, 2024 at 12:02 AM Boris Burkov <boris@xxxxxx> wrote:
> > >
> > > It is possible to confuse the btrfs device cache (fs_devices) by
> > > starting with a multi-device filesystem, then removing and re-adding a
> > > device in a way which changes its dev_t while the filesystem is
> > > unmounted. After this procedure, if we remount, then we are in a funny
> > > state where struct btrfs_device's "devt" field does not match the bd_dev
> > > of the "bdev" field. I would say this is bad enough, as we have violated
> > > a pretty clear invariant.
> > >
> > > But for style points, we can then remove the extra device from the fs,
> > > making it a single device fs, which enables the "temp_fsid" feature,
> > > which permits multiple separate mounts of different devices with the
> > > same fsid. Since btrfs is confused and *thinks* there are different
> > > devices (based on device->devt), it allows a second redundant mount of
> > > the same device (not a bind mount!). This then allows us to corrupt the
> > > original mount by doing stuff to the one that should be a bind mount.
> > >
> > > This is fixed by the combination of the kernel patch: > > btrfs: support device name lookup in forget
> > > and the btrfs-progs patches:
> > > btrfs-progs: allow btrfs device scan -u on dead dev
> > > btrfs-progs: add udev rule to forget removed device
> >
> > May I suggest to make this more readable, easier to the eye?
> > My inserting blank lines and indenting the lines with the patch
> > subjects, like for example:
> >
> > """
> > This is fixed by the combination of the kernel patch:
> >
> >     btrfs: support device name lookup in forget
> >
> > and the btrfs-progs patches:
> >
> >    btrfs-progs: allow btrfs device scan -u on dead dev
> >    btrfs-progs: add udev rule to forget removed device
> > """
> >
> > And these should be placed in the test case itself with:
> >
> >    _fixed_by_git_commit btrfs-progs xxxxxxxxxx "btrfs-progs: allow
> > btrfs device scan -u on dead dev"
> >
> > For btrfs-progs commits, and for the kernel:
> >
> >    _fixed_by_kernel_commit xxxxxxxxxxxx "btrfs: support device name
> > lookup in forget"
>
> I'll do so going forward, thanks.
>
> >
> > I see however that there's discussion for those patches between you,
> > Anand and David, and it
> > seems there's a chance those patches won't be merged to fix this bug,
> > especially the kernel one
> > for which Anand submitted an alternative. If those are added to the
> > test case itself, can always be
> > updated later if needed.
>
> I'm leaning towards omitting this part from the commit message and we
> can update the test case once we pick a winner. Is that OK with you?

Yes, thanks.

>
> >
> > Also avoid putting the test number in the subject - there's no
> > guarantee it will end up with that number when merged.
> >
> > >
> > > Signed-off-by: Boris Burkov <boris@xxxxxx>
> > > ---
> > >  common/config       |   1 +
> > >  common/rc           |   4 ++
> > >  tests/btrfs/311     | 101 ++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/btrfs/311.out |   2 +
> > >  4 files changed, 108 insertions(+)
> > >  create mode 100755 tests/btrfs/311
> > >  create mode 100644 tests/btrfs/311.out
> > >
> > > diff --git a/common/config b/common/config
> > > index a3b15b96f..43b517fda 100644
> > > --- a/common/config
> > > +++ b/common/config
> > > @@ -235,6 +235,7 @@ export BLKZONE_PROG="$(type -P blkzone)"
> > >  export GZIP_PROG="$(type -P gzip)"
> > >  export BTRFS_IMAGE_PROG="$(type -P btrfs-image)"
> > >  export BTRFS_MAP_LOGICAL_PROG=$(type -P btrfs-map-logical)
> > > +export PARTED_PROG="$(type -P parted)"
> > >
> > >  # use 'udevadm settle' or 'udevsettle' to wait for lv to be settled.
> > >  # newer systems have udevadm command but older systems like RHEL5 don't.
> > > diff --git a/common/rc b/common/rc
> > > index 30c44dddd..8e009aca9 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > > @@ -5375,6 +5375,10 @@ _require_unshare() {
> > >                 _notrun "unshare $*: command not found, should be in util-linux"
> > >  }
> > >
> > > +_require_parted() {
> >
> > These three last functions from common/rc use that style, the { after
> > the function name in the same line,
> > but everywhere else the { is on a new line, and that's the most common
> > style in fstests.
> > I wish we had some consistency.
> >
> > > +       $PARTED_PROG --list &>/dev/null || _notrun "parted: command not found"
> >
> > Why not just call:
> >
> > _require_command "$PARTED_PROG" parted
> >
> > Could even do that in the test and no need for a common function in
> > this file, as we do in many other tests
> > (grep for "'_require_command'" in tests/generic for example).
> >
> > > +}
> > > +
> > >  # Return a random file in a directory. A directory is *not* followed
> > >  # recursively.
> > >  _random_file() {
> > > diff --git a/tests/btrfs/311 b/tests/btrfs/311
> > > new file mode 100755
> > > index 000000000..887c46ba0
> > > --- /dev/null
> > > +++ b/tests/btrfs/311
> > > @@ -0,0 +1,101 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2024 Meta, Inc. All Rights Reserved.
> > > +#
> > > +# FS QA Test 311
> > > +#
> > > +# Test an edge case of multi device volume management in btrfs.
> > > +# If a device changes devt between mounts of a multi device fs, we can trick
> > > +# btrfs into mounting the same device twice fully (not as a bind mount). From
> > > +# there, it is trivial to induce corruption.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto quick volume
> >
> > Missing 'scrub' group.
> >
> > > +
> > > +# real QA test starts here
> > > +_supported_fs btrfs
> > > +_require_test
> > > +_require_parted
> > > +
> > > +_cleanup() {
> > > +       cd /
> > > +       umount $MNT
> > > +       umount $BIND
> > > +       losetup -d $DEV0
> > > +       losetup -d $DEV1
> > > +       losetup -d $DEV2
> > > +       rm $IMG0
> > > +       rm $IMG1
> > > +       rm $IMG2
> > > +}
> > > +
> > > +IMG0=$TEST_DIR/$$.img0
> > > +IMG1=$TEST_DIR/$$.img1
> > > +IMG2=$TEST_DIR/$$.img2
> > > +truncate -s 1G $IMG0
> > > +truncate -s 1G $IMG1
> > > +truncate -s 1G $IMG2
> > > +DEV0=$(losetup -f $IMG0 --show)
> > > +DEV1=$(losetup -f $IMG1 --show)
> > > +DEV2=$(losetup -f $IMG2 --show)
> > > +D0P1=$DEV0"p1"
> > > +D1P1=$DEV1"p1"
> > > +MNT=$TEST_DIR/mnt
> > > +BIND=$TEST_DIR/bind
> > > +
> > > +# Setup partition table with one partition on each device.
> > > +$PARTED_PROG $DEV0 'mktable gpt' --script
> > > +$PARTED_PROG $DEV1 'mktable gpt' --script
> > > +$PARTED_PROG $DEV0 'mkpart mypart 1M 100%' --script
> > > +$PARTED_PROG $DEV1 'mkpart mypart 1M 100%' --script
> > > +
> > > +# mkfs with two devices to avoid clearing devices on close
> > > +# single raid to allow removing DEV2.
> > > +$MKFS_BTRFS_PROG -f -msingle -dsingle $D0P1 $DEV2 &>/dev/null
> >
> > Please redirect to the $seqres.full instead, and _fail if mkfs fails.
> > That's what we do nowadays due to unpleasant surprises in the past.
> >
> > > +
> > > +# Cycle mount the two device fs to populate both devices into the
> > > +# stale device cache.
> > > +mkdir -p $MNT
> > > +mount $D0P1 $MNT
> >
> > Use the _mount() helper.
> >
> > > +umount $MNT
> >
> > We use $UMOUNT_PROG in fstests.
> >
> > > +
> > > +# Swap the partition dev_ts. This leaves the dev_t in the cache out of date.
> > > +$PARTED_PROG $DEV0 'rm 1' --script
> > > +$PARTED_PROG $DEV1 'rm 1' --script
> > > +$PARTED_PROG $DEV1 'mkpart mypart 1M 100%' --script
> > > +$PARTED_PROG $DEV0 'mkpart mypart 1M 100%' --script
> > > +
> > > +# Mount with mismatched dev_t!
> > > +mount $D0P1 $MNT || _fail "failed to remount; don't proceed and do dangerous stuff on raw mount point"
> >
> > Same here.
> >
> > > +
> > > +# Remove the extra device to bring temp-fsid back in the fray.
> > > +$BTRFS_UTIL_PROG device remove $DEV2 $MNT
> > > +
> > > +# Create the would be bind mount.
> > > +mkdir -p $BIND
> > > +mount $D0P1 $BIND
> >
> > Same here.
> >
> > > +mount_show=$($BTRFS_UTIL_PROG filesystem show $MNT)
> > > +bind_show=$($BTRFS_UTIL_PROG filesystem show $BIND)
> > > +# If they're different, we are in trouble.
> > > +[ "$mount_show" = "$bind_show" ] || echo "$mount_show != $bind_show"
> > > +
> > > +# Now really prove it by corrupting the first mount with the second.
> > > +for i in $(seq 20); do
> > > +       $XFS_IO_PROG -f -c "pwrite 0 50M" $MNT/foo.$i >$seqres.full 2>&1
> >
> > This is overriding the .full file, use >>
> >
> > > +done
> > > +for i in $(seq 20); do
> > > +       $XFS_IO_PROG -f -c "pwrite 0 50M" $BIND/foo.$i >$seqres.full 2>&1
> >
> > Same here, this is overriding the .full file, use >>
> >
> > > +done
> > > +sync
> >
> > Can we please have a comment mentioning why the sync is needed?
> >
> > > +find $BIND -type f -delete
> > > +sync
> >
> > Same here.
> >
> > > +
> > > +# This should blow up both mounts, if the writes somehow didn't overlap at all.
> > > +$FSTRIM_PROG $BIND
> >
> > Since it's using the fstrim program and needs trim to be supported,
> > the test misses a:
> >
> > _require_batched_discard "$TEST_DIR"
> >
> > at the top.
> >
> > > +echo 3 > /proc/sys/vm/drop_caches
> >
> > Please add a comment mentioning why we are dropping caches.
> >
> > > +$BTRFS_UTIL_PROG scrub start -B $MNT >>$seqres.full 2>&1
> >
> > The test passes whether scrub fails or succeeds, as it's redirecting
> > stdout and stderr to the .full file and ignoring the exit status of
> > the command.
> >
> > The ideal fstests way is to put the expected output in the golden
> > output file and just call the command without redirecting anything.
> > If that's not doable for some reason, at the very least do a (...) ||
> > echo "Scrub failed, check $seqres.full for details."
> >
> > Thanks.
> >
> > > +
> > > +# success, all done
> > > +echo "Silence is golden"
> > > +status=0
> > > +exit
> > > diff --git a/tests/btrfs/311.out b/tests/btrfs/311.out
> > > new file mode 100644
> > > index 000000000..62f253029
> > > --- /dev/null
> > > +++ b/tests/btrfs/311.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 311
> > > +Silence is golden
> > > --
> > > 2.43.0
> > >
> > >