On Fri, Nov 1, 2024 at 5:08 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > > [BUG] > When there is a process in the background remounting a btrfs, switching > between RO/RW, then another process try to mount another subvolume of > the same btrfs read-only, we can hit a race causing the RW mount to fail > with -EBUSY: > > [CAUSE] > During the btrfs mount, to support mounting different subvolumes with > different RO/RW flags, we have a small hack during the mount: > > Retry with matching RO flags if the initial mount fail with -EBUSY. > > The problem is, during that retry we do not hold any super block lock > (s_umount), this meanings there can be a remount process changing the RO > flags of the original fs super block. > > If so, we can have an EBUSY error during retry. > And this time we treat any failure as an error, without any retry and > cause the above EBUSY mount failure. > > [FIX] > The fix is already sent to the mailing list. > The fix is to allow btrfs to have different RO flag between super block > and mount point during mount, and if the RO flag mismatch, reconfigure > the fs to RW with s_umount hold, so that there will be no race. > > [TEST CASE] > The test case will create two processes: > > - Remounting an existing subvolume mount point > Switching between RO and RW > > - Mounting another subvolume RW > After a successful mount, unmount and retry. > > This is enough to trigger the -EBUSY error in less than 5 seconds. > To be extra safe, the test case will run for 10 seconds at least, and > follow TIME_FACTOR for extra loads. > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > --- > tests/btrfs/325 | 80 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/325.out | 2 ++ > 2 files changed, 82 insertions(+) > create mode 100755 tests/btrfs/325 > create mode 100644 tests/btrfs/325.out > > diff --git a/tests/btrfs/325 b/tests/btrfs/325 > new file mode 100755 > index 00000000..d0713b39 > --- /dev/null > +++ b/tests/btrfs/325 > @@ -0,0 +1,80 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 325 > +# > +# Test that mounting a subvolume read-write will success, with another > +# subvolume being remounted RO/RW at background > +# > +. ./common/preamble > +_begin_fstest auto quick mount remount > + > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: fix mount failure due to remount races" > + > +_cleanup() > +{ > + cd / > + umount "$subv1_mount" &> /dev/null > + umount "$subv2_mount" &> /dev/null $UMOUNT_PROG > + rm -rf -- "$subv1_mount" &> /dev/null > + rm -rf -- "$subv2_mount" &> /dev/null > +} > + > +# For the extra mount points > +_require_test > +_require_scratch > + > +_scratch_mkfs >> $seqres.full 2>&1 > +_scratch_mount > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full > +_scratch_unmount > + > +subv1_mount="$TEST_DIR/subvol1_mount" > +subv2_mount="$TEST_DIR/subvol2_mount" > +mkdir -p "$subv1_mount" > +mkdir -p "$subv2_mount" > + > +# Subv1 remount workload, mount the subv1 and switching it between > +# RO and RW. > +_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1 > +while _mount -o remount,ro "$subv1_mount"; do > + mount -o remount,rw "$subv1_mount" _mount here too. > +done & > +subv1_pid=$! > + > +# Subv2 rw mount workload > +# For unpatched kernel, this will fail with -EBUSY. > +# > +# To maintain the per-subvolume RO/RW mount, if the existing btrfs is > +# mounted RO, unpatched btrfs will retry with its RO flag reverted, > +# then reconfigure the fs to RW. > +# > +# But such retry has no super blocl lock hold, thus if another remount blocl -> block > +# process has already remounted the fs RW, the attempt will fail and > +# return -EBUSY. > +# > +# Patched kernel will allow the superblock and mount point RO flags > +# to differ, and then hold the s_umount to reconfigure the superblock > +# to RW, preventing any possible race. > +while _mount "$SCRATCH_DEV" "$subv2_mount "-o subvol=subvol2; do > + umount "$subv2_mount"; $UMOUNT_PROG > +done & > +subv2_pid=$! We should have the _cleanup function kill and wait for $subv1_pid and $subv2_pid in case the test is interrupted, something like: kill $subv1_pid $subv2_pid &> /dev/null wait $subv1_pid $subv2_pid &> /dev/null > + > +sleep $(( 10 * $TIME_FACTOR )) > + > +kill $subv1_pid > +kill $subv2_pid Add a 'wait' here, to make sure the umounts below don't fail in case the subvolumes are mounted. > +umount "$subv1_mount" &> /dev/null > +umount "$subv2_mount" &> /dev/null $UMOUNT_PROG > +rm -rf -- "$subv1_mount" &> /dev/null > +rm -rf -- "$subv2_mount" &> /dev/null Also, why these "&> /dev/null" redirections? If we have the 'wait', the umounts should succeed and the rm -fr shouldn't fail - in fact it's useful to test that they don't fail, that the umounts were successful in case the subvolumes were mounted. Thanks. > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out > new file mode 100644 > index 00000000..cf13795c > --- /dev/null > +++ b/tests/btrfs/325.out > @@ -0,0 +1,2 @@ > +QA output created by 325 > +Silence is golden > -- > 2.46.0 > >