Re: [PATCH v5 01/14] md: don't ignore suspended array in md_check_recovery()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/02/16 14:58, Xiao Ni 写道:
On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

mddev_suspend() never stop sync_thread, hence it doesn't make sense to
ignore suspended array in md_check_recovery(), which might cause
sync_thread can't be unregistered.

After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following
hang can be triggered by test shell/integrity-caching.sh:

Hi Kuai

After applying this patch, it's still stuck at mddev_suspend. Maybe
the deadlock can be fixed by other patches from the patch set. But
this patch can't fix this issue. If so, the comment is not right.

This patch alone can't fix the problem that mddev_suspend() can stuck
thoroughly, patches 1-4 will all be needed.

Thanks,
Kuai



1) suspend the array:
raid_postsuspend
  mddev_suspend

2) stop the array:
raid_dtr
  md_stop
   __md_stop_writes
    stop_sync_thread
     set_bit(MD_RECOVERY_INTR, &mddev->recovery);
     md_wakeup_thread_directly(mddev->sync_thread);
     wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))

3) sync thread done:
md_do_sync
  set_bit(MD_RECOVERY_DONE, &mddev->recovery);
  md_wakeup_thread(mddev->thread);

4) daemon thread can't unregister sync thread:
md_check_recovery
  if (mddev->suspended)
    return; -> return directly
  md_read_sync_thread
  clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
  -> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang;

I add some debug logs when stopping dmraid with lvremove command. The
step you mentioned are sequential but not async. The process is :
dev_remove->dm_destroy->__dm_destroy->dm_table_postsuspend_targets(raid_postsuspend)
-> dm_table_destroy(raid_dtr). It looks like mddev_suspend is waiting
for active_io to be zero.

Best Regards
Xiao

This problem is not just related to dm-raid, fix it by ignoring
suspended array in md_check_recovery(). And follow up patches will
improve dm-raid better to frozen sync thread during suspend.

Reported-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@xxxxxxxxxx/
Fixes: 68866e425be2 ("MD: no sync IO while suspended")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
  drivers/md/md.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2266358d8074..07b80278eaa5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9469,9 +9469,6 @@ static void md_start_sync(struct work_struct *ws)
   */
  void md_check_recovery(struct mddev *mddev)
  {
-       if (READ_ONCE(mddev->suspended))
-               return;
-
         if (mddev->bitmap)
                 md_bitmap_daemon_work(mddev);

--
2.39.2



.






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux