On 26.01.21 12:14, Guoqing Jiang wrote:
Hi Donald,
On 1/26/21 10:50, Donald Buczek wrote:
[...]
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2d21c298ffa7..f40429843906 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4687,11 +4687,13 @@ action_store(struct mddev *mddev, const char *page, size_t len)
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
mddev_lock(mddev) == 0) {
+ set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
flush_workqueue(md_misc_wq);
if (mddev->sync_thread) {
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
md_reap_sync_thread(mddev);
}
+ clear_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
mddev_unlock(mddev);
}
} else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
Yes, it could break the deadlock issue, but I am not sure if it is the right way given we only set ALLOW_SB_UPDATE in suspend which makes sense since the io will be quiesced, but write idle action can't guarantee the similar thing.
Thinking (and documenting) MD_ALLOW_SB_UPDATE as "the holder of reconfig_mutex promises not to make any changes which would exclude superblocks from being written" might make it easier to accept the usage.
I am not sure it is safe to set the flag here since write idle can't prevent IO from fs while mddev_suspend can guarantee that.
I prefer to make resync thread not wait forever here.
[...]
- sh = raid5_get_active_stripe(conf, new_sector, previous,
+ sh = raid5_get_active_stripe(conf, new_sector, previous, 0,
Mistake here (fourth argument added instead of third)
Thanks for checking.
[...]
Unfortunately, this patch did not fix the issue.
root@sloth:~/linux# cat /proc/$(pgrep md3_resync)/stack
[<0>] raid5_get_active_stripe+0x1e7/0x6b0
[<0>] raid5_sync_request+0x2a7/0x3d0
[<0>] md_do_sync.cold+0x3ee/0x97c
[<0>] md_thread+0xab/0x160
[<0>] kthread+0x11b/0x140
[<0>] ret_from_fork+0x22/0x30
which is ( according to objdump -d -Sl drivers/md/raid5.o ) at https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/raid5.c#L735
Isn't it still the case that the superblocks are not written, therefore stripes are not processed, therefore number of active stripes are not decreasing? Who is expected to wake up conf->wait_for_stripe waiters?
Hmm, how about wake the waiter up in the while loop of raid5d?
@@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread)
md_check_recovery(mddev);
spin_lock_irq(&conf->device_lock);
}
+
+ if ((atomic_read(&conf->active_stripes)
+ < (conf->max_nr_stripes * 3 / 4) ||
+ (test_bit(MD_RECOVERY_INTR, &mddev->recovery))))
+ wake_up(&conf->wait_for_stripe);
}
pr_debug("%d stripes handled\n", handled);
Hmm... With this patch on top of your other one, we still have the basic symptoms (md3_raid6 busy looping), but the sync thread is now hanging at
root@sloth:~# cat /proc/$(pgrep md3_resync)/stack
[<0>] md_do_sync.cold+0x8ec/0x97c
[<0>] md_thread+0xab/0x160
[<0>] kthread+0x11b/0x140
[<0>] ret_from_fork+0x22/0x30
instead, which is https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L8963
And, unlike before, "md: md3: data-check interrupted." from the pr_info two lines above appears in dmesg.
Best
Donald
If the issue still appears then I will change the waiter to break just if MD_RECOVERY_INTR is set, something like.
wait_event_lock_irq(conf->wait_for_stripe,
(test_bit(MD_RECOVERY_INTR, &mddev->recovery) && sync_req) ||
/* the previous condition */,
*(conf->hash_locks + hash));
Thanks,
Guoqing