Re: [PATCH V2] md: don't unregister sync_thread with reconfig_mutex held

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 24, 2021 at 1:26 AM Guoqing Jiang
<guoqing.jiang@xxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2/24/21 10:09, Song Liu wrote:
> > On Mon, Feb 15, 2021 at 3:08 AM Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote:
> >>
> >> [+cc Donald]
> >>
> >> Am 13.02.21 um 01:49 schrieb Guoqing Jiang:
> >>> Unregister sync_thread doesn't need to hold reconfig_mutex since it
> >>> doesn't reconfigure array.
> >>>
> >>> And it could cause deadlock problem for raid5 as follows:
> >>>
> >>> 1. process A tried to reap sync thread with reconfig_mutex held after echo
> >>>      idle to sync_action.
> >>> 2. raid5 sync thread was blocked if there were too many active stripes.
> >>> 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
> >>>      which causes the number of active stripes can't be decreased.
> >>> 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
> >>>      to hold reconfig_mutex.
> >>>
> >>> More details in the link:
> >>> https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@xxxxxxxxxxxxx/T/#t
> >>>
> >>> And add one parameter to md_reap_sync_thread since it could be called by
> >>> dm-raid which doesn't hold reconfig_mutex.
> >>>
> >>> Reported-and-tested-by: Donald Buczek <buczek@xxxxxxxxxxxxx>
> >>> Signed-off-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx>
> >
> > I don't really like this fix. But I haven't got a better (and not too
> > complicated)
> > alternative.
> >
> >>> ---
> >>> V2:
> >>> 1. add one parameter to md_reap_sync_thread per Jack's suggestion.
> >>>
> >>>    drivers/md/dm-raid.c |  2 +-
> >>>    drivers/md/md.c      | 14 +++++++++-----
> >>>    drivers/md/md.h      |  2 +-
> >>>    3 files changed, 11 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> >>> index cab12b2..0c4cbba 100644
> >>> --- a/drivers/md/dm-raid.c
> >>> +++ b/drivers/md/dm-raid.c
> >>> @@ -3668,7 +3668,7 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
> >>>        if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
> >>>                if (mddev->sync_thread) {
> >>>                        set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >>> -                     md_reap_sync_thread(mddev);
> >>> +                     md_reap_sync_thread(mddev, false);
> >
> > I think we can add mddev_lock() and mddev_unlock() here and then we don't
> > need the extra parameter?
> >
>
> I thought it too, but I would prefer get the input from DM people first.
>
> @ Mike or Alasdair

Hi Mike and Alasdair,

Could you please comment on this option: adding mddev_lock() and mddev_unlock()
to raid_message() around md_reap_sync_thread()?

Thanks,
Song

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux