Re: [PATCH RFC 1/4] dm-raid/md: Clear MD_RECOVERY_WAIT when stopping dmraid

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Mon, 26 Feb 2024 09:31:35 +0800

Hi,

在 2024/02/23 21:20, Xiao Ni 写道:
On Fri, Feb 23, 2024 at 11:32 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/02/20 23:30, Xiao Ni 写道:
MD_RECOVERY_WAIT is used by dmraid to delay reshape process by patch
commit 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock").
Before patch commit f52f5c71f3d4b ("md: fix stopping sync thread")
dmraid stopped sync thread directy by calling md_reap_sync_thread.
After this patch dmraid stops sync thread asynchronously as md does.
This is right. Now the dmraid stop process is like this:

1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread.
stop_sync_thread sets MD_RECOVERY_INTR and wait until MD_RECOVERY_RUNNING
is cleared
2. md_do_sync finds MD_RECOVERY_WAIT is set and return. (This is the
root cause for this deadlock. We hope md_do_sync can set MD_RECOVERY_DONE)
3. md thread calls md_check_recovery (This is the place to reap sync
thread. Because MD_RECOVERY_DONE is not set. md thread can't reap sync
thread)
4. raid_dtr stops/free struct mddev and release dmraid related resources

dmraid only sets MD_RECOVERY_WAIT but doesn't clear it. It needs to clear
this bit when stopping the dmraid before stopping sync thread.

But the deadlock still can happen sometimes even MD_RECOVERY_WAIT is
cleared before stopping sync thread. It's the reason stop_sync_thread only
wakes up task. If the task isn't running, it still needs to wake up sync
thread too.

This deadlock can be reproduced 100% by these commands:
modprobe brd rd_size=34816 rd_nr=5
while [ 1 ]; do
vgcreate test_vg /dev/ram*
lvcreate --type raid5 -L 16M -n test_lv test_vg
lvconvert -y --stripes 4 /dev/test_vg/test_lv
vgremove test_vg -ff
sleep 1
done

Fixes: 644e2537fdc7 ("dm raid: fix stripe adding reshape deadlock")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>

I'm not sure about this change, I think MD_RECOVERY_WAIT is hacky and
really breaks how sync_thread is working, it should just go away soon,
once we make sure sync_thread can't be registered before pers->start()
is done.

Hi Kuai

I just want to get to a stable state without changing any existing
logic. After fixing these regressions, we can consider other changes.
(e.g. remove MD_RECOVERY_WAIT. allow sync thread start/stop when array
is suspend. )  I talked with Heinz yesterday, for dm-raid, it can't
allow any io to happen including sync thread when array is suspended.

So, are you still thinking that my patchset will allow this for dm-raid?

I already explain a lot why patch 1 from my set is okay, and why the set
doesn't introduce any behaviour change like you said in this patch 0:

"Kuai's patch set breaks some rules".

The only thing that will change is that for md/raid, sync_thrad can
start for suspended array, however, I don't think this will be a problem
because sync_thread can be running for suspended array already, and
'MD_RECOVERY_FROZEN' is already used to prevent sync_thread to start.

Thanks,
Kuai


Regards
Xiao

Thanks,
Kuai
---
   drivers/md/dm-raid.c | 2 ++
   drivers/md/md.c      | 1 +
   2 files changed, 3 insertions(+)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index eb009d6bb03a..325767c1140f 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3796,6 +3796,8 @@ static void raid_postsuspend(struct dm_target *ti)
       struct raid_set *rs = ti->private;

       if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
+             if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
+                     clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
               /* Writes have to be stopped before suspending to avoid deadlocks. */
               if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
                       md_stop_writes(&rs->md);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2266358d8074..54790261254d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4904,6 +4904,7 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
        * never happen
        */
       md_wakeup_thread_directly(mddev->sync_thread);
+     md_wakeup_thread(mddev->sync_thread);
       if (work_pending(&mddev->sync_work))
               flush_work(&mddev->sync_work);




.