On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> wrote: > > > > On 2/24/21 10:09, Song Liu wrote: > > On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote: > >> > >> [+cc Donald] > >> > >> Am 13.02.21 um 01:49 schrieb Guoqing Jiang: > >>> Unregister sync_thread doesn't need to hold reconfig_mutex since it > >>> doesn't reconfigure array. > >>> > >>> And it could cause deadlock problem for raid5 as follows: > >>> > >>> 1. process A tried to reap sync thread with reconfig_mutex held after echo > >>> idle to sync_action. > >>> 2. raid5 sync thread was blocked if there were too many active stripes. > >>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer) > >>> which causes the number of active stripes can't be decreased. > >>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able > >>> to hold reconfig_mutex. > >>> > >>> More details in the link: > >>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@xxxxxxxxxxxxx/T/#t > >>> > >>> And add one parameter to md_reap_sync_thread since it could be called by > >>> dm-raid which doesn't hold reconfig_mutex. > >>> > >>> Reported-and-tested-by: Donald Buczek <buczek@xxxxxxxxxxxxx> > >>> Signed-off-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> > > > > I don't really like this fix. But I haven't got a better (and not too > > complicated) > > alternative. > > > >>> --- > >>> V2: > >>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion. > >>> > >>> drivers/md/dm-raid.c | 2 +- > >>> drivers/md/md.c | 14 +++++++++----- > >>> drivers/md/md.h | 2 +- > >>> 3 files changed, 11 insertions(+), 7 deletions(-) > >>> > >>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c > >>> index cab12b2..0c4cbba 100644 > >>> --- a/drivers/md/dm-raid.c > >>> +++ b/drivers/md/dm-raid.c > >>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv, > >>> if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) { > >>> if (mddev->sync_thread) { > >>> set_bit(MD_RECOVERY_INTR, &mddev->recovery); > >>> - md_reap_sync_thread(mddev); > >>> + md_reap_sync_thread(mddev, false); > > > > I think we can add mddev_lock() and mddev_unlock() here and then we don't > > need the extra parameter? > > > > I thought it too, but I would prefer get the input from DM people first. > > @ Mike or Alasdair Hi Mike and Alasdair, Could you please comment on this option: adding mddev_lock() and mddev_unlock() to raid_message() around md_reap_sync_thread()? Thanks, Song