On Wed, Mar 02, 2022 at 01:43:16PM +0000, Sidong Yang wrote: > On Wed, Mar 02, 2022 at 12:10:08PM +0000, Filipe Manana wrote: > > Hi, Filipe! > Thanks for review. > > On Tue, Mar 01, 2022 at 03:19:30PM +0000, Sidong Yang wrote: > > > Test enabling/disable quota and creating/destroying qgroup repeatedly > > > in asynchronous and confirm it does not cause kernel hang. This is a > > > > in asynchronous -> in parallel > > Sure, Thanks! > > > > > regression test for the problem reported to linux-btrfs list [1]. > > > > It's worth mentioning the deadlock only happens starting with kernel 5.17-rc3. > > It only happens in 5.17-rc3 version? I didn't know about it. I'll add > mention about this. Well, in the kernel patch we have: Fixes: e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker") And that commit was introduced in 5.17-rc3. Maybe it deadlocked in a different way before that commit, perhaps in the way that e804861bd4e6 describes. However I haven't checked how it behaves on a kernel without that commit. But at least we know that currently it deadlocks at 5.17-rc3+. > > > > > > > > The hang was recreated using the test case and fixed by kernel patch > > > titled > > > > > > btrfs: qgroup: fix deadlock between rescan worker and remove qgroup > > > > > > [1] https://lore.kernel.org/linux-btrfs/20220228014340.21309-1-realwakka@xxxxxxxxx/ > > > > > > Signed-off-by: Sidong Yang <realwakka@xxxxxxxxx> > > > > In addition to Shinichiro's comments... > > > > > --- > > > tests/btrfs/262 | 54 +++++++++++++++++++++++++++++++++++++++++++++ > > > tests/btrfs/262.out | 2 ++ > > > 2 files changed, 56 insertions(+) > > > create mode 100755 tests/btrfs/262 > > > create mode 100644 tests/btrfs/262.out > > > > > > diff --git a/tests/btrfs/262 b/tests/btrfs/262 > > > new file mode 100755 > > > index 00000000..9be380f9 > > > --- /dev/null > > > +++ b/tests/btrfs/262 > > > @@ -0,0 +1,54 @@ > > > +#! /bin/bash > > > +# SPDX-License-Identifier: GPL-2.0 > > > +# Copyright (c) 2022 YOUR NAME HERE. All Rights Reserved. > > > +# > > > +# FS QA Test 262 > > > +# > > > +# Test the deadlock between qgroup and quota commands > > > > The test description should be a lot more clear. > > > > "the deadlock" is vague a gives the wrong idea we only ever had a single > > deadlock related to qgroups. "qgroup and quota commands" is confusing, > > and "qgroup" and "quota" are pretty much synonyms, and it should mention > > which commands. > > > > Plus what we want to test is that we can run some qgroup operations in > > parallel without triggering a deadlock, crash, etc. > > > > Perhaps something like: > > > > """ > > Test that running qgroup enable, create, destroy and disable commands in > > parallel does not result in a deadlock, a crash or any filesystem > > inconsistency. > > """ > > > Yeah, It was not clear. I found that this test is not only for checking > deadlock. But it checks that test runs without any problem. > > > > > > +# > > > +. ./common/preamble > > > +_begin_fstest auto qgroup > > > > Can also be added to the "quick" group. It takes 1 second in my slowest vm. > > Okay, Thanks! > > > > > + > > > +# Import common functions. > > > +. ./common/filter > > > + > > > +# real QA test starts here > > > + > > > +# Modify as appropriate. > > > +_supported_fs btrfs > > > + > > > +_require_scratch > > > + > > > +# Run command that enable/disable quota and create/destroy qgroup asynchronously > > > > With the more clear test description above, this can go away. > > Sure! > > > > > +qgroup_deadlock_test() > > > +{ > > > + _scratch_mkfs > /dev/null 2>&1 > > > + _scratch_mount > > > + echo "=== qgroup deadlock test ===" >> $seqres.full > > > > There's no point in echoing this message to the .full file, it provides no > > value at all, as testing that is all that this testcase does. > > I agree. This is pointless because it's simple test. > > > > > + > > > + pids=() > > > + for ((i = 0; i < 200; i++)); do > > > + $BTRFS_UTIL_PROG quota enable $SCRATCH_MNT 2>> $seqres.full & > > > + pids+=($!) > > > + $BTRFS_UTIL_PROG qgroup create 1/0 $SCRATCH_MNT 2>> $seqres.full & > > > + pids+=($!) > > > + $BTRFS_UTIL_PROG qgroup destroy 1/0 $SCRATCH_MNT 2>> $seqres.full & > > > + pids+=($!) > > > + $BTRFS_UTIL_PROG quota disable $SCRATCH_MNT 2>> $seqres.full & > > > + pids+=($!) > > > + done > > > + > > > + for pid in "${pids[@]}"; do > > > + wait $pid > > > + done > > > > As pointed before by Shinichiro, a simple 'wait' here is enough, no need to > > keep track of the PIDs. > > Yeah, I don't have to go hard way. > > > > > + > > > + _scratch_unmount > > > + _check_scratch_fs > > > > Not needed, the fstests framework automatically runs 'btrfs check' when a test > > finishes. Doing this explicitly is only necessary when we need to do several > > mount/unmount operations and want to check the fs is fine after each unmount > > and before the next mount. > > > > I didn't know about that. I don't need to check manually. > > > +} > > > + > > > +qgroup_deadlock_test > > > > There's no point in putting all the test code in a function, as the function > > is only called once. > > Of course! > > > > Otherwise it looks good, and the test works as advertised, it triggers a > > deadlock on 5.17-rc3+ kernel and passes on a patched kernel. > > > > Thanks for converting the reproducer into a test case. > > > > Thanks for detailed review. I'll back soon with v2. > > Thanks, > Sidong > > > + > > > +# success, all done > > > +echo "Silence is golden" > > > +status=0 > > > +exit > > > diff --git a/tests/btrfs/262.out b/tests/btrfs/262.out > > > new file mode 100644 > > > index 00000000..404badc3 > > > --- /dev/null > > > +++ b/tests/btrfs/262.out > > > @@ -0,0 +1,2 @@ > > > +QA output created by 262 > > > +Silence is golden > > > -- > > > 2.25.1 > > >