On 19.07.2024 09:02, Yu Kuai wrote:
Hi, With some discussion and log collection, looks like this is a deadlock introduced by: https://lore.kernel.org/r/20230825031622.1530464-8-yukuai1@xxxxxxxxxxxxxxx Root cause is that: 1) New io is blocked because array is suspended; 2) md_start_sync suspend the array, and it's waiting for inflight IO to be done; 3) inflight IO is waiting for md_start_sync to be done, from md_start_write->flush_work(). Can you give following patch a test? Thanks! Kuai diff --git a/drivers/md/md.c b/drivers/md/md.c index 64693913ed18..10c2d816062a 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8668,7 +8668,6 @@ void md_write_start(struct mddev *mddev, struct bio *bi) BUG_ON(mddev->ro == MD_RDONLY); if (mddev->ro == MD_AUTO_READ) { /* need to switch to read/write */ - flush_work(&mddev->sync_work); mddev->ro = MD_RDWR; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread);
Hi Kuai, With the patch you provided the issue still reproduces. Thanks, Mateusz