On Tue, 23 May 2023 21:38:59 +0800 Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > The deadlock is described in [1], it's fixed first by [2], however, > it turns out this commit will trigger other problems[3], hence this > commit will be reverted and the deadlock is supposed to be fixed by [1]. > > [1] > https://lore.kernel.org/linux-raid/20230322064122.2384589-5-yukuai1@xxxxxxxxxxxxxxx/ > [2] > https://lore.kernel.org/linux-raid/20220621031129.24778-1-guoqing.jiang@xxxxxxxxx/ > [3] > https://lore.kernel.org/linux-raid/20230322064122.2384589-2-yukuai1@xxxxxxxxxxxxxxx/ > > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- > tests/24raid10deadlock | 85 +++++++++++++++++++++++++++++ > tests/24raid10deadlock.inject_error | 0 > 2 files changed, 85 insertions(+) > create mode 100644 tests/24raid10deadlock > create mode 100644 tests/24raid10deadlock.inject_error > > diff --git a/tests/24raid10deadlock b/tests/24raid10deadlock > new file mode 100644 > index 00000000..27869840 > --- /dev/null > +++ b/tests/24raid10deadlock > @@ -0,0 +1,85 @@ > +devs="$dev0 $dev1 $dev2 $dev3" > +runtime=120 > +pid="" > + > +set_up_injection() > +{ > + echo -1 > /sys/kernel/debug/fail_make_request/times > + echo 1 > /sys/kernel/debug/fail_make_request/probability > + echo 0 > /sys/kernel/debug/fail_make_request/verbose > + echo 1 > /sys/block/${1##*/}/make-it-fail > +} > + > +clean_up_injection() > +{ > + echo 0 > /sys/block/${1##*/}/make-it-fail > + echo 0 > /sys/kernel/debug/fail_make_request/times > + echo 0 > /sys/kernel/debug/fail_make_request/probability > + echo 2 > /sys/kernel/debug/fail_make_request/verbose > +} > + > +test_rdev() > +{ > + while true; do > + mdadm -f $md0 $1 &> /dev/null > + mdadm -r $md0 $1 &> /dev/null > + mdadm --zero-superblock $1 &> /dev/null > + mdadm -a $md0 $1 &> /dev/null > + sleep $2 > + done > +} > + > +test_write_action() > +{ > + while true; do > + echo frozen > /sys/block/md0/md/sync_action > + echo idle > /sys/block/md0/md/sync_action > + sleep 0.1 > + done > +} > + > +set_up_test() > +{ > + fio -h &> /dev/null || die "fio not found" > + > + # create a simple raid10 > + mdadm -Cv -R -n 4 -l10 $md0 $devs || die "create raid10 failed" > +} > + > +clean_up_test() > +{ > + clean_up_injection $dev0 > + kill -9 $pid > + pkill -9 fio > + > + sleep 1 > + > + if ! mdadm -S $md0; then > + die "can't stop array, deadlock is probably triggered" > + fi stop may fail from different reasons I see it as too big to be marker of "deadlock". I know that --stop still fails because md is unable to clear sysfs attrs in expected time (or a least it was a problem few years ago). Is there a better way to check that? I would prefer, different less complicated action to exclude false positives. In my IMSM environment I still see that md stop stress test is failing sporadically (1/100). Thanks, Mariusz