Re: [PATCH v5 01/14] md: don't ignore suspended array in md_check_recovery()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 18, 2024 at 9:15 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2024/02/16 14:58, Xiao Ni 写道:
> > On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>
> >> From: Yu Kuai <yukuai3@xxxxxxxxxx>
> >>
> >> mddev_suspend() never stop sync_thread, hence it doesn't make sense to
> >> ignore suspended array in md_check_recovery(), which might cause
> >> sync_thread can't be unregistered.
> >>
> >> After commit f52f5c71f3d4 ("md: fix stopping sync thread"), following
> >> hang can be triggered by test shell/integrity-caching.sh:
> >
> > Hi Kuai
> >
> > After applying this patch, it's still stuck at mddev_suspend. Maybe
> > the deadlock can be fixed by other patches from the patch set. But
> > this patch can't fix this issue. If so, the comment is not right.
>
> This patch alone can't fix the problem that mddev_suspend() can stuck
> thoroughly, patches 1-4 will all be needed.
>
> Thanks,
> Kuai
>
> >
> >>
> >> 1) suspend the array:
> >> raid_postsuspend
> >>   mddev_suspend
> >>
> >> 2) stop the array:
> >> raid_dtr
> >>   md_stop
> >>    __md_stop_writes
> >>     stop_sync_thread
> >>      set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >>      md_wakeup_thread_directly(mddev->sync_thread);
> >>      wait_event(..., !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
> >>
> >> 3) sync thread done:
> >> md_do_sync
> >>   set_bit(MD_RECOVERY_DONE, &mddev->recovery);
> >>   md_wakeup_thread(mddev->thread);
> >>
> >> 4) daemon thread can't unregister sync thread:
> >> md_check_recovery
> >>   if (mddev->suspended)
> >>     return; -> return directly
> >>   md_read_sync_thread
> >>   clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
> >>   -> MD_RECOVERY_RUNNING can't be cleared, hence step 2 hang;
> >
> > I add some debug logs when stopping dmraid with lvremove command. The
> > step you mentioned are sequential but not async. The process is :
> > dev_remove->dm_destroy->__dm_destroy->dm_table_postsuspend_targets(raid_postsuspend)
> > -> dm_table_destroy(raid_dtr). It looks like mddev_suspend is waiting
> > for active_io to be zero.

The deadlock problem mentioned in this patch should not be right?

Regards
Xiao


> >
> > Best Regards
> > Xiao
> >
> >> This problem is not just related to dm-raid, fix it by ignoring
> >> suspended array in md_check_recovery(). And follow up patches will
> >> improve dm-raid better to frozen sync thread during suspend.
> >>
> >> Reported-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> >> Closes: https://lore.kernel.org/all/8fb335e-6d2c-dbb5-d7-ded8db5145a@xxxxxxxxxx/
> >> Fixes: 68866e425be2 ("MD: no sync IO while suspended")
> >> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> >> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
> >> ---
> >>   drivers/md/md.c | 3 ---
> >>   1 file changed, 3 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 2266358d8074..07b80278eaa5 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -9469,9 +9469,6 @@ static void md_start_sync(struct work_struct *ws)
> >>    */
> >>   void md_check_recovery(struct mddev *mddev)
> >>   {
> >> -       if (READ_ONCE(mddev->suspended))
> >> -               return;
> >> -
> >>          if (mddev->bitmap)
> >>                  md_bitmap_daemon_work(mddev);
> >>
> >> --
> >> 2.39.2
> >>
> >
> >
> > .
> >
>






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux