From: Yu Kuai <yukuai3@xxxxxxxxxx> After commit db5e653d7c9f ("md: delay choosing sync action to md_start_sync()"), md_start_sync() will hold 'reconfig_mutex', however, in order to make sure event_work is done, __md_stop() will flush workqueue with reconfig_mutex grabbed, hence if sync_work is still pending, deadlock will be triggered. md_stop md_start_sync mddev_lock mddev_lock flush_workqueue -> deadlock Currently, __md_stop() is the only place to flush workqueue with 'reconfig_mutex' grabbed, and event_work is only used for dm-raid, instead of split sync_work out of the workqueue, fix this problem the easy way by moving flush_workqueue to dm-raid where 'reconfig_mutex' is not held, this is safe because do_table_event() doesn't relate to mdadm and can be called after md_stop(). Fixes: db5e653d7c9f ("md: delay choosing sync action to md_start_sync()") Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> Signed-off-by: Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> --- drivers/md/dm-raid.c | 3 +++ drivers/md/md.c | 3 --- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index a4692f8f98ee..51f15c20f621 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -3317,6 +3317,9 @@ static void raid_dtr(struct dm_target *ti) mddev_lock_nointr(&rs->md); md_stop(&rs->md); mddev_unlock(&rs->md); + + if (work_pending(&rs->md.event_work)) + flush_work(&rs->md.event_work); raid_set_free(rs); } diff --git a/drivers/md/md.c b/drivers/md/md.c index 35f3dd7db369..8f5df249448d 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6378,9 +6378,6 @@ static void __md_stop(struct mddev *mddev) struct md_personality *pers = mddev->pers; md_bitmap_destroy(mddev); mddev_detach(mddev); - /* Ensure ->event_work is done */ - if (mddev->event_work.func) - flush_workqueue(md_misc_wq); spin_lock(&mddev->lock); mddev->pers = NULL; spin_unlock(&mddev->lock); -- 2.39.2