On Wed, Nov 06, 2024 at 04:13:28PM +1030, Qu Wenruo wrote: > [BUG] > When there is a process in the background remounting a btrfs, switching > between RO/RW, then another process try to mount another subvolume of > the same btrfs read-only, we can hit a race causing the RW mount to fail > with -EBUSY: > > [CAUSE] > During the btrfs mount, to support mounting different subvolumes with > different RO/RW flags, we have a small hack during the mount: > > Retry with matching RO flags if the initial mount fail with -EBUSY. > > The problem is, during that retry we do not hold any super block lock > (s_umount), this meanings there can be a remount process changing the RO > flags of the original fs super block. > > If so, we can have an EBUSY error during retry. > And this time we treat any failure as an error, without any retry and > cause the above EBUSY mount failure. > > [FIX] > The fix is already sent to the mailing list. > The fix is to allow btrfs to have different RO flag between super block > and mount point during mount, and if the RO flag mismatch, reconfigure > the fs to RW with s_umount hold, so that there will be no race. > > [TEST CASE] > The test case will create two processes: > > - Remounting an existing subvolume mount point > Switching between RO and RW > > - Mounting another subvolume RW > After a successful mount, unmount and retry. > > This is enough to trigger the -EBUSY error in less than 5 seconds. > To be extra safe, the test case will run for 10 seconds at least, and > follow TIME_FACTOR for extra loads. > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> > --- > Changelog: > v4: > - Add the missing reviewed-by tag from Filipe > > v3: > - Also remove the default temporaray files in cleanup > > - Unset mount/remount pid and avoid killing them if unset during cleanup > > - Remove the mount points to avoid name conflicts > > - Extra comments on the the final unmounts for both mount point and > cleanup > > v2: > - Better signal handling for both remount and mount workload > Need to do the extra handling for both workload > > - Wait for any child process to exit before cleanup > > - Keep the stderr of the final rm command > > - Keep the stderr of the unmount of $subv1_mount > Which should always be mounted. > > - Use "$UMOUNT_PROG" instead of direct "umount" > > - Use "_mount" instead of direct "mount" > > - Fix a typo of "block" > --- Thanks for this new test case, this version is good to me: Reviewed-by: Zorro Lang <zlang@xxxxxxxxxx> > tests/btrfs/325 | 107 ++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/325.out | 2 + > 2 files changed, 109 insertions(+) > create mode 100755 tests/btrfs/325 > create mode 100644 tests/btrfs/325.out > > diff --git a/tests/btrfs/325 b/tests/btrfs/325 > new file mode 100755 > index 00000000..0f4984f1 > --- /dev/null > +++ b/tests/btrfs/325 > @@ -0,0 +1,107 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 325 > +# > +# Test that mounting a subvolume read-write will success, with another > +# subvolume being remounted RO/RW at background > +# > +. ./common/preamble > +_begin_fstest auto quick mount remount > + > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: fix mount failure due to remount races" > + > +_cleanup() > +{ > + cd / > + rm -rf -- $tmp.* > + [ -n "$mount_pid" ] && kill $mount_pid &> /dev/null > + [ -n "$remount_pid" ] && kill $remount_pid &> /dev/null > + wait > + $UMOUNT_PROG "$subv1_mount" &> /dev/null > + $UMOUNT_PROG "$subv2_mount" &> /dev/null > + rm -rf -- "$subv1_mount" "$subv2_mount" > +} > + > +# For the extra mount points > +_require_test > +_require_scratch > + > +_scratch_mkfs >> $seqres.full 2>&1 > +_scratch_mount > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full > +_scratch_unmount > + > +subv1_mount="$TEST_DIR/subvol1_mount" > +subv2_mount="$TEST_DIR/subvol2_mount" > +rm -rf "$subv1_mount" "$subv2_mount" > +mkdir -p "$subv1_mount" > +mkdir -p "$subv2_mount" > +_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1 > + > +# Subv1 remount workload, mount the subv1 and switching it between > +# RO and RW. > +remount_workload() > +{ > + trap "wait; exit" SIGTERM > + while true; do > + _mount -o remount,ro "$subv1_mount" > + _mount -o remount,rw "$subv1_mount" > + done > +} > + > +remount_workload & > +remount_pid=$! > + > +# Subv2 rw mount workload > +# For unpatched kernel, this will fail with -EBUSY. > +# > +# To maintain the per-subvolume RO/RW mount, if the existing btrfs is > +# mounted RO, unpatched btrfs will retry with its RO flag reverted, > +# then reconfigure the fs to RW. > +# > +# But such retry has no super block lock hold, thus if another remount > +# process has already remounted the fs RW, the attempt will fail and > +# return -EBUSY. > +# > +# Patched kernel will allow the superblock and mount point RO flags > +# to differ, and then hold the s_umount to reconfigure the super block > +# to RW, preventing any possible race. > +mount_workload() > +{ > + trap "wait; exit" SIGTERM > + while true; do > + _mount "$SCRATCH_DEV" "$subv2_mount" > + $UMOUNT_PROG "$subv2_mount" > + done > +} > + > +mount_workload & > +mount_pid=$! > + > +sleep $(( 10 * $TIME_FACTOR )) > + > +kill "$remount_pid" "$mount_pid" > +wait > +unset remount_pid mount_pid > + > +# Subv1 is always mounted, thus the umount should never fail. > +$UMOUNT_PROG "$subv1_mount" > + > +# Subv2 may have already been unmounted, so here we ignore all output. > +# This may hide some errors like -EBUSY, but the next rm line would > +# detect any still mounted subvolume so we're still safe. > +$UMOUNT_PROG "$subv2_mount" &> /dev/null > + > +# If above unmount, especially subv2 is not properly unmounted, > +# the rm should fail with some error message > +rm -r -- "$subv1_mount" "$subv2_mount" > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out > new file mode 100644 > index 00000000..cf13795c > --- /dev/null > +++ b/tests/btrfs/325.out > @@ -0,0 +1,2 @@ > +QA output created by 325 > +Silence is golden > -- > 2.46.0 > >