On Sun, Feb 18, 2024 at 10:34 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2024/02/18 10:27, Xiao Ni 写道: > > On Sun, Feb 18, 2024 at 9:46 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> 在 2024/02/18 9:33, Xiao Ni 写道: > >>> The deadlock problem mentioned in this patch should not be right? > >> > >> No, I think it's right. Looks like you are expecting other problems, > >> like mentioned in patch 6, to be fixed by this patch. > > > > Hi Kuai > > > > Could you explain why step1 and step2 from this comment can happen > > simultaneously? From the log, the process should be > > The process is : > > dev_remove->dm_destroy->__dm_destroy->dm_table_postsuspend_targets(raid_postsuspend) > > -> dm_table_destroy(raid_dtr). > > After suspending the array, it calls raid_dtr. So these two functions > > can't happen simultaneously. > > You're removing the target directly, however, dm can suspend the disk > directly, you can simplily: > > 1) dmsetup suspend xxx > 2) dmsetup remove xxx For dm-raid, the design of suspend stops sync thread first and then it calls mddev_suspend to suspend array. So I'm curious why the sync thread can still exit when array is suspended. I know the reason now. Because before f52f5c71f (md: fix stopping sync thread), the process is raid_postsuspend->md_stop_writes->__md_stop_writes (__md_stop_writes sets MD_RECOVERY_FROZEN). In patch f52f5c71f, it doesn't set MD_RECOVERY_FROZEN in __md_stop_writes anymore. The process changes to 1. raid_postsuspend->md_stop_writes->__md_stop_writes->stop_sync_thread (wait until MD_RECOVERY_RUNNING clears) 2. md thread -> md_check_recovery -> unregister_sync_thread -> md_reap_sync_thread (clears MD_RECOVERY_RUNNING, stop_sync_thread returns, md_reap_sync_thread sets MD_RECOVERY_NEEDED) 3. raid_postsuspend->mddev_suspend 4. md sync thread starts again because __md_stop_writes doesn't set MD_RECOVERY_FROZEN. It's the reason why we can see sync thread still happens when raid is suspended. So the patch fix this problem should: diff --git a/drivers/md/md.c b/drivers/md/md.c index 9e41a9aaba8b..666761466f02 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6315,6 +6315,7 @@ static void md_clean(struct mddev *mddev) static void __md_stop_writes(struct mddev *mddev) { + set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); stop_sync_thread(mddev, true, false); del_timer_sync(&mddev->safemode_timer); Like other places which call stop_sync_thread, it needs to set the MD_RECOVERY_FROZEN bit. Regards Xiao > > Please also take a look at other patches, why step 1) can't stop sync > thread. > > Thanks, > Kuai > > > > > > >> > >> Noted that this patch just fix one case that MD_RECOVERY_RUNNING can't > >> be cleared, I you are testing this patch alone, please make sure that > >> you still triggered the exactly same case: > >> > >> - MD_RCOVERY_RUNNING can't be cleared while array is suspended. > > > > I'm not testing this patch. I want to understand the patch well. So I > > need to understand the issue first. I can't understand how this > > deadlock (step1,step2) happens. > > > > Regards > > Xiao > >> > >> Thanks, > >> Kuai > >> > > > > . > > >