Re: [PATCH RFC 1/1] md: Use pers->quiesce in mddev_suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 5, 2024 at 5:22 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2024/11/05 17:01, Xiao Ni 写道:
> > On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> 在 2024/11/05 16:16, Mariusz Tkaczyk 写道:
> >>> On Tue,  5 Nov 2024 15:57:33 +0800
> >>> Xiao Ni <xni@xxxxxxxxxx> wrote:
> >>>
> >>>> One customer reports a bug: raid5 is hung when changing thread cnt
> >>>> while resync is running. The stripes are all in conf->handle_list
> >>>> and new threads can't handle them.
> >>>
> >>> Is issue fixed with this patch? Is is missing here :)
> >>>>
> >>>> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> >>>> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
> >>>> requests finish in suspend operation. One personality knows itself the
> >>>> best. So pers->quiesce is a proper way to let personality quiesce.
> >>>
> >>>>
> >>>> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> >>>> Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>
> >>>> ---
> >>>>    drivers/md/md.c | 6 ++++++
> >>>>    1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>>> index 67108c397c5a..7409ecb2df68 100644
> >>>> --- a/drivers/md/md.c
> >>>> +++ b/drivers/md/md.c
> >>>> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
> >>>>               return err;
> >>>>       }
> >>>>
> >>>> +    if (mddev->pers)
> >>>> +            mddev->pers->quiesce(mddev, 1);
> >>>> +
> >>>
> >>> Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are
> >>> not implementing this?
> >>>
> >>> +     if (mddev->pers && mddev->pers->quiesce)
> >>> +             mddev->pers->quiesce(mddev, 1);
> >>
> >> It's fine, the fops is never NULL, just some levels points to an empty
> >> function.
> >>
> >> The tricky part here is that accessing "mddev->pers" is not safe here,
> >> 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent
> >> with, for example, change levels. :(
> >
> > Hi Kuai
> >
> > Now mddev->suspended is protected by mddev->suspend_mutex. It should
> > can avoid the problem you mentioned above? level_store calls
> > mddev_suspend and mddev->suspended is set to mddev->suspended+1. So
> > other paths will return because mddev->suspended is not 0.
>
> Yeah, level store is just an wrong example, key point here is that
> mddev->pers is not protected by 'suspend_mutex'. Other places are
> md_run() and md_stop(), these path doesn't hold 'suspend_mutex' right?
> And they look like can concurrent with suspend, for example,
> suspend_lo_store()

Yes, thanks for pointing out this. I need to think more about this.

>
> Did you consider just fail raid5_store_group_thread_cnt() if sync_thread
> is active?

It's one method, but not a good one.

>
> Or call ->quiesce() directly here with 'reconfig_mutex' held before
> suspend?

I'll think this way. Thanks!

Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Regards
> > Xiao
> >>
> >> Thanks,
> >> Kuai
> >>
> >>>
> >>> Is it reproducible with upstream kernel?
> >>>
> >>> Thanks,
> >>> Mariusz
> >>>
> >>> .
> >>>
> >>
> >
> >
> > .
> >
>






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux