Hi, Sorry for the late reply. The first two patches of the set look good, so I applied them to md-tmp-6.9 branch. However, this one needs a respin. On Thu, Dec 28, 2023 at 4:58 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > Before refactoring idle and frozen from action_store, interruptible apis > is used so that hungtask warning won't be triggered if it takes too long > to finish idle/frozen sync_thread. So change to use interruptible apis. This paragraph is confusing. Please rephrase it. > > In order not to make stop_sync_thread() more complicated, factor out a > helper prepare_to_stop_sync_thread() to replace stop_sync_thread(). > > Also return error to user if idle/frozen_sync_thread() failed, otherwise > user will be misleaded. s/misleaded/misled/ > > Fixes: 130443d60b1b ("md: refactor idle/frozen_sync_thread() to fix deadlock") > Fixes: 8e8e2518fcec ("md: Close race when setting 'action' to 'idle'.") Please add more information about what is being fixed here, so that we can make a clear decision on whether the fix needs to be back ported to stable kernels. > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- > drivers/md/md.c | 105 ++++++++++++++++++++++++++++++------------------ > 1 file changed, 67 insertions(+), 38 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 60f99768a1a9..9ea05de79fe4 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -4846,26 +4846,34 @@ action_show(struct mddev *mddev, char *page) > return sprintf(page, "%s\n", type); > } > > +static bool sync_thread_stopped(struct mddev *mddev, int *sync_seq) I think we need a comment for this. > +{ > + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) > + return true; > + > + if (sync_seq && *sync_seq != atomic_read(&mddev->sync_seq)) > + return true; > + > + return false; > +} > + > /** > - * stop_sync_thread() - wait for sync_thread to stop if it's running. > + * prepare_to_stop_sync_thread() - prepare to stop sync_thread if it's running. > * @mddev: the array. > - * @locked: if set, reconfig_mutex will still be held after this function > - * return; if not set, reconfig_mutex will be released after this > - * function return. > - * @check_seq: if set, only wait for curent running sync_thread to stop, noted > - * that new sync_thread can still start. > + * @unlock: whether or not caller want to release reconfig_mutex if > + * sync_thread is not running. > + * > + * Return true if sync_thread is running, release reconfig_mutex and do > + * preparatory work to stop sync_thread, caller should wait for > + * sync_thread_stopped() to return true. Return false if sync_thread is not > + * running, reconfig_mutex will be released if @unlock is set. > */ I found prepare_to_stop_sync_thread very hard to reason. Please try to rephrase the comment or refactor the code. Maybe it makes sense to put the following logic and its variations to a separate function: if (prepare_to_stop_sync_thread(mddev, false)) { wait_event(resync_wait, sync_thread_stopped(mddev, NULL)); mddev_lock_nointr(mddev); } Thanks, Song > -static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq) > +static bool prepare_to_stop_sync_thread(struct mddev *mddev, bool unlock) > { > - int sync_seq; [...]