Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

Donald Buczek <buczek@xxxxxxxxxxxxx> · Tue, 26 Jan 2021 13:58:06 +0100

On 26.01.21 12:14, Guoqing Jiang wrote:
Hi Donald,

On 1/26/21 10:50, Donald Buczek wrote:
[...]

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2d21c298ffa7..f40429843906 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4687,11 +4687,13 @@ action_store(struct mddev *mddev, const char *page, size_t len)
              clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
          if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
              mddev_lock(mddev) == 0) {
+            set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
              flush_workqueue(md_misc_wq);
              if (mddev->sync_thread) {
                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
                  md_reap_sync_thread(mddev);
              }
+            clear_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
              mddev_unlock(mddev);
          }
      } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))

Yes, it could break the deadlock issue, but I am not sure if it is the right way given we only set ALLOW_SB_UPDATE in suspend which makes sense since the io will be quiesced, but write idle action can't guarantee the  similar thing.

Thinking (and documenting) MD_ALLOW_SB_UPDATE as "the holder of reconfig_mutex promises not to make any changes which would exclude superblocks from being written" might make it easier to accept the usage.

I am not sure it is safe to set the flag here since write idle can't prevent IO from fs while mddev_suspend can guarantee that.


I prefer to make resync thread not wait forever here.


[...]


-        sh = raid5_get_active_stripe(conf, new_sector, previous,
+        sh = raid5_get_active_stripe(conf, new_sector, previous, 0,


Mistake here (fourth argument added instead of third)

Thanks for checking.

[...]

Unfortunately, this patch did not fix the issue.

     root@sloth:~/linux# cat /proc/$(pgrep md3_resync)/stack
     [<0>] raid5_get_active_stripe+0x1e7/0x6b0
     [<0>] raid5_sync_request+0x2a7/0x3d0
     [<0>] md_do_sync.cold+0x3ee/0x97c
     [<0>] md_thread+0xab/0x160
     [<0>] kthread+0x11b/0x140
     [<0>] ret_from_fork+0x22/0x30

which is ( according to objdump -d -Sl drivers/md/raid5.o ) at https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/raid5.c#L735

Isn't it still the case that the superblocks are not written, therefore stripes are not processed, therefore number of active stripes are not decreasing? Who is expected to wake up conf->wait_for_stripe waiters?

Hmm, how about wake the waiter up in the while loop of raid5d?

@@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread)
                         md_check_recovery(mddev);
                         spin_lock_irq(&conf->device_lock);
                 }
+
+               if ((atomic_read(&conf->active_stripes)
+                    < (conf->max_nr_stripes * 3 / 4) ||
+                    (test_bit(MD_RECOVERY_INTR, &mddev->recovery))))
+                       wake_up(&conf->wait_for_stripe);
         }
         pr_debug("%d stripes handled\n", handled);

Hmm... With this patch on top of your other one, we still have the basic symptoms (md3_raid6 busy looping), but the sync thread is now hanging at

    root@sloth:~# cat /proc/$(pgrep md3_resync)/stack
    [<0>] md_do_sync.cold+0x8ec/0x97c
    [<0>] md_thread+0xab/0x160
    [<0>] kthread+0x11b/0x140
    [<0>] ret_from_fork+0x22/0x30

instead, which is https://elixir.bootlin.com/linux/latest/source/drivers/md/md.c#L8963

And, unlike before, "md: md3: data-check interrupted." from the pr_info two lines above appears in dmesg.

Best
  Donald

If the issue still appears then I will change the waiter to break just if MD_RECOVERY_INTR is set, something like.

wait_event_lock_irq(conf->wait_for_stripe,
     (test_bit(MD_RECOVERY_INTR, &mddev->recovery) && sync_req) ||
      /* the previous condition */,
     *(conf->hash_locks + hash));

Thanks,
Guoqing