+ dm-devel@xxxxxxxxxxxxxxx On Fri, Nov 24, 2023 at 12:00 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > After commit db5e653d7c9f ("md: delay choosing sync action to > md_start_sync()"), md_start_sync() will hold 'reconfig_mutex', however, > in order to make sure event_work is done, __md_stop() will flush > workqueue with reconfig_mutex grabbed, hence if sync_work is still > pending, deadlock will be triggered. > > Fortunately, former pacthes to fix stopping sync_thread already make sure > all sync_work is done already, hence such deadlock is not possible > anymore. However, in order not to cause confusions for people by this > implicit dependency, delay flushing event_work to dm-raid where > 'reconfig_mutex' is not held, and add some comments to emphasize that > the workqueue can't be flushed with 'reconfig_mutex'. > > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- > drivers/md/dm-raid.c | 3 +++ > drivers/md/md.c | 11 ++++++++--- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > index a4692f8f98ee..51f15c20f621 100644 > --- a/drivers/md/dm-raid.c > +++ b/drivers/md/dm-raid.c > @@ -3317,6 +3317,9 @@ static void raid_dtr(struct dm_target *ti) > mddev_lock_nointr(&rs->md); > md_stop(&rs->md); > mddev_unlock(&rs->md); > + > + if (work_pending(&rs->md.event_work)) > + flush_work(&rs->md.event_work); > raid_set_free(rs); > } > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index fb8a7d1eebee..86efc9c2ae56 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -82,6 +82,14 @@ static struct module *md_cluster_mod; > > static DECLARE_WAIT_QUEUE_HEAD(resync_wait); > static struct workqueue_struct *md_wq; > + > +/* > + * This workqueue is used for sync_work to register new sync_thread, and for > + * del_work to remove rdev, and for event_work that is only set by dm-raid. > + * > + * Noted that sync_work will grab reconfig_mutex, hence never flush this > + * workqueue whith reconfig_mutex grabbed. > + */ > static struct workqueue_struct *md_misc_wq; > struct workqueue_struct *md_bitmap_wq; > > @@ -6328,9 +6336,6 @@ static void __md_stop(struct mddev *mddev) > struct md_personality *pers = mddev->pers; > md_bitmap_destroy(mddev); > mddev_detach(mddev); > - /* Ensure ->event_work is done */ > - if (mddev->event_work.func) > - flush_workqueue(md_misc_wq); > spin_lock(&mddev->lock); > mddev->pers = NULL; > spin_unlock(&mddev->lock); > -- > 2.39.2 >