Re: [PATCH 5/7] md: fix deadlock in shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Thu, 18 Jan 2024 09:51:16 +0800

Hi,

在 2024/01/18 2:21, Mikulas Patocka 写道:
This commit fixes a deadlock in the LVM2 test
shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh

When MD_RECOVERY_WAIT is set or when md_is_rdwr(mddev) is true, the
function md_do_sync would not set MD_RECOVERY_DONE. Thus, stop_sync_thread
would wait for the flag MD_RECOVERY_DONE indefinitely.

Also, md_wakeup_thread_directly does nothing if the thread is waiting in
md_thread on thread->wqueue (it wakes the thread up, the thread would
check THREAD_WAKEUP and go to sleep again without doing anything). So,
this commit introduces a call to md_wakeup_thread from
md_wakeup_thread_directly.

task:lvm             state:D stack:0     pid:46322 tgid:46322 ppid:46079  flags:0x00004002
Call Trace:
  <TASK>
  __schedule+0x228/0x570
  schedule+0x29/0xa0
  schedule_timeout+0x6a/0xd0
  ? timer_shutdown_sync+0x10/0x10
  stop_sync_thread+0x197/0x1c0 [md_mod]
  ? housekeeping_test_cpu+0x30/0x30
  ? table_deps+0x1b0/0x1b0 [dm_mod]
  __md_stop_writes+0x10/0xd0 [md_mod]
  md_stop_writes+0x18/0x30 [md_mod]
  raid_postsuspend+0x32/0x40 [dm_raid]
  dm_table_postsuspend_targets+0x34/0x50 [dm_mod]
  dm_suspend+0xc4/0xd0 [dm_mod]
  dev_suspend+0x186/0x2d0 [dm_mod]
  ? table_deps+0x1b0/0x1b0 [dm_mod]
  ctl_ioctl+0x2e1/0x570 [dm_mod]
  dm_ctl_ioctl+0x5/0x10 [dm_mod]
  __x64_sys_ioctl+0x85/0xa0
  do_syscall_64+0x5d/0x1a0
  entry_SYSCALL_64_after_hwframe+0x46/0x4e

Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Cc: stable@xxxxxxxxxxxxxxx	# v6.7

---
  drivers/md/md.c    |    8 +++++++-
  drivers/md/raid5.c |    4 ++++
  2 files changed, 11 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/md/md.c
===================================================================

--- linux-2.6.orig/drivers/md/md.c
+++ linux-2.6/drivers/md/md.c
@@ -8029,6 +8029,8 @@ static void md_wakeup_thread_directly(st
  	if (t)
  		wake_up_process(t->tsk);
  	rcu_read_unlock();
+
+	md_wakeup_thread(thread);

This is not correct. I already explained(already in comments) what
md_wakeup_thread_directly() is supposed to do.
  }
  
  void md_wakeup_thread(struct md_thread __rcu *thread)
@@ -8777,10 +8779,14 @@ void md_do_sync(struct md_thread *thread
  
  	/* just incase thread restarts... */
  	if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
-	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
+	    test_bit(MD_RECOVERY_WAIT, &mddev->recovery)) {
+		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+			set_bit(MD_RECOVERY_DONE, &mddev->recovery);

If you set MD_RECOVERY_DONE here, sync_thread will be unregistered, I
don't think this is the expected behaviour. Only dm-raid is using this
flag, and rs_start_reshape() already explains that it wants
sync_thread to work later until the table gets reloaded.

  		return;
+	}
  	if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
  		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
+		set_bit(MD_RECOVERY_DONE, &mddev->recovery);

This change looks reasonable.

Thanks,
Kuai

  		return;
  	}
  

.