Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition

Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> · Tue, 26 Jan 2021 12:14:49 +0100

Hi Donald,

On 1/26/21 10:50, Donald Buczek wrote:
[...]

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2d21c298ffa7..f40429843906 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4687,11 +4687,13 @@ action_store(struct mddev *mddev, const char 
*page, size_t len)
              clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
          if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) &&
              mddev_lock(mddev) == 0) {
+            set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
              flush_workqueue(md_misc_wq);
              if (mddev->sync_thread) {
                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
                  md_reap_sync_thread(mddev);
              }
+            clear_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
              mddev_unlock(mddev);
          }
      } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))

Yes, it could break the deadlock issue, but I am not sure if it is the 
right way given we only set ALLOW_SB_UPDATE in suspend which makes 
sense since the io will be quiesced, but write idle action can't 
guarantee the  similar thing.

Thinking (and documenting) MD_ALLOW_SB_UPDATE as "the holder of 
reconfig_mutex promises not to make any changes which would exclude 
superblocks from being written" might make it easier to accept the usage.

I am not sure it is safe to set the flag here since write idle can't 
prevent IO from fs while mddev_suspend can guarantee that.


I prefer to make resync thread not wait forever here.


[...]


-        sh = raid5_get_active_stripe(conf, new_sector, previous,
+        sh = raid5_get_active_stripe(conf, new_sector, previous, 0,


Mistake here (fourth argument added instead of third)

Thanks for checking.

[...]

Unfortunately, this patch did not fix the issue.

     root@sloth:~/linux# cat /proc/$(pgrep md3_resync)/stack
     [<0>] raid5_get_active_stripe+0x1e7/0x6b0
     [<0>] raid5_sync_request+0x2a7/0x3d0
     [<0>] md_do_sync.cold+0x3ee/0x97c
     [<0>] md_thread+0xab/0x160
     [<0>] kthread+0x11b/0x140
     [<0>] ret_from_fork+0x22/0x30

which is ( according to objdump -d -Sl drivers/md/raid5.o ) at 
https://elixir.bootlin.com/linux/v5.11-rc5/source/drivers/md/raid5.c#L735

Isn't it still the case that the superblocks are not written, therefore 
stripes are not processed, therefore number of active stripes are not 
decreasing? Who is expected to wake up conf->wait_for_stripe waiters?

Hmm, how about wake the waiter up in the while loop of raid5d?

@@ -6520,6 +6532,11 @@ static void raid5d(struct md_thread *thread)
                        md_check_recovery(mddev);
                        spin_lock_irq(&conf->device_lock);
                }
+
+               if ((atomic_read(&conf->active_stripes)
+                    < (conf->max_nr_stripes * 3 / 4) ||
+                    (test_bit(MD_RECOVERY_INTR, &mddev->recovery))))
+                       wake_up(&conf->wait_for_stripe);
        }
        pr_debug("%d stripes handled\n", handled);

If the issue still appears then I will change the waiter to break just 
if MD_RECOVERY_INTR is set, something like.

wait_event_lock_irq(conf->wait_for_stripe,
	(test_bit(MD_RECOVERY_INTR, &mddev->recovery) && sync_req) ||
	 /* the previous condition */,
	*(conf->hash_locks + hash));

Thanks,
Guoqing