Re: [PATCH] Don't jump to compute_result state from check_result state

Song Liu <liu.song.a23@xxxxxxxxx> · Tue, 16 Apr 2019 10:40:36 -0700

On Mon, Apr 15, 2019 at 1:53 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
>
>
>
> On 04/15/2019 03:24 PM, Song Liu wrote:
> > On Mon, Apr 15, 2019 at 10:54 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
> >>
> >>
> >> On 04/15/2019 01:44 PM, Song Liu wrote:
> >>> Hi Nigel,
> >>>
> >>> On Thu, Apr 11, 2019 at 10:21 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
> >>>> On Tue, Apr 9, 2019 at 12:26 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
> >>>>> On 4/9/19 2:09 PM, John Stoffel wrote:
> >>>>>>>>>>> "Dan" == Dan Williams <dan.j.williams@xxxxxxxxx> writes:
> >>>>>> Dan> On Mon, Apr 8, 2019 at 4:18 PM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
> >>>>>> Dan> [..]
> >>>>>>>>>> To trigger this issue, you not only need a failed disk but to also
> >>>>>>>>>> perform a scrubbing operation.  The customer's systems both crashed
> >>>>>>>>>> early Sunday morning when the raid-check script is run by default from cron.
> >>>>>>>>> Ok, I follow this, but I come to a different answer on the required
> >>>>>>>>> fix. I think it is simply the following to preserve the writeback
> >>>>>>>>> action after the parity check, because we need the failed Q slot to be
> >>>>>>>>> written back if we're recovering. P will be not up-to-date because it
> >>>>>>>>> was checked with the good disks, but sh->ops.zero_sum_result will be
> >>>>>>>>> 0, so that will skip the writeback of a !uptodate P value.
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >>>>>>>>> index c033bfcb209e..e2eb59289346 100644
> >>>>>>>>> --- a/drivers/md/raid5.c
> >>>>>>>>> +++ b/drivers/md/raid5.c
> >>>>>>>>> @@ -4187,7 +4187,6 @@ static void handle_parity_checks6(struct r5conf
> >>>>>>>>> *conf, struct stripe_head *sh,
> >>>>>>>>>                    /* now write out any block on a failed drive,
> >>>>>>>>>                     * or P or Q if they were recomputed
> >>>>>>>>>                     */
> >>>>>>>>> -               BUG_ON(s->uptodate < disks - 1); /* We don't need Q to
> >>>>>>>>> recover */
> >>>>>>>> Thanks Dan!
> >>>>>>>>
> >>>>>>>> Would it make sense to rework the check as
> >>>>>>>>
> >>>>>>>> BUG_ON(s->uptodate < disks - 2);
> >>>>>> Dan> I think the problem is that any 'uptodate' vs 'disks' check is
> >>>>>> Dan> not precise enough in this path. What might be better is to put
> >>>>>> Dan> "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the devices that
> >>>>>> Dan> might try to kick off writes and then skip the action. Better to
> >>>>>> Dan> prevent the raid driver from taking unexpected action *and* keep
> >>>>>> Dan> the system alive vs killing the machine with BUG_ON.
> >>>>>>
> >>>>>> Dan> BUG_ON has fallen out of favor for exception reporting since
> >>>>>> Dan> those assertions were introduced.
> >>>>>>
> >>>>>> And since it' causes the system to crash... it's super annoying when
> >>>>>> the rest of the system is working fine.  Please only use a WARN_ON,
> >>>>>> and maybe even set the RAID volume readonly, etc.  But don't bring
> >>>>>> down the rest of the system if possible.
> >>>>>>
> >>>>>> John
> >>>>> I reverted the first patch as it made its way upstream.
> >>>>>
> >>>>> Testing this change now.
> >>>>>
> >>>>> ---
> >>>>>
> >>>>>     drivers/md/raid5.c | 2 +-
> >>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >>>>> index c033bfcb209e..660ca3af2914 100644
> >>>>> --- a/drivers/md/raid5.c
> >>>>> +++ b/drivers/md/raid5.c
> >>>>> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
> >>>>> *conf, struct stripe_head *sh,
> >>>>>             /* now write out any block on a failed drive,
> >>>>>              * or P or Q if they were recomputed
> >>>>>              */
> >>>>> -        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
> >>>>> +        WARN_ON(s->uptodate < disks - 2); /* We don't need Q to recover */
> >>>> I think this WARN_ON() is the best way to go, though the comment "don't need Q"
> >>>> needs some revise.
> >>>>
> >>>> Nigel, how does this work in your tests?
> >>>>
> >>>> Thanks,
> >>>> Song
> >>> Could you please share updates about the tests?
> >>>
> >>> Thanks,
> >>> Song
> >> Hello Song,
> >> Our customer has confirmed, the issue is fixed. He had run this on two
> >> systems with defective Disks and run raid-check. There were some errors
> >> corrected, the device was degraded, but the kernel never crashed.
> >>
> >> -Nigel
> > Thanks Nigel!
> >
> > Could you please submit official patch?
> >
> > Song
> >
> >>>>>             if (s->failed == 2) {
> >>>>>                 dev = &sh->dev[s->failed_num[1]];
> >>>>>                 s->locked++;
> >>>>> --
> >>>>> 2.20.1
> >>>>>
>
> Song,   You will need to revert 4f4fd7c5798 before applying this:
> -Nigel
>
> From: Nigel Croxon <ncroxon@xxxxxxxxxx>
> Date: Wed, 10 Apr 2019 09:23:34 -0400
> Subject: [PATCH ] preserve the writeback action after the parity check
>
> The problem is that any 'uptodate' vs 'disks' check is not precise
> in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
> device that might try to kick off writes and then skip the action.
> Better to prevent the raid driver from taking unexpected action *and* keep
> the system alive vs killing the machine with BUG_ON.
>
> Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result
> state
>
> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
> ---
>   drivers/md/raid5.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index f957b7e42562..4f6b32366c00 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
> *conf, struct stripe_head *sh,
>           /* now write out any block on a failed drive,
>            * or P or Q if they were recomputed
>            */
> -        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
> +        dev = NULL;
>           if (s->failed == 2) {
>               dev = &sh->dev[s->failed_num[1]];
>               s->locked++;
> @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf
> *conf, struct stripe_head *sh,
>               set_bit(R5_LOCKED, &dev->flags);
>               set_bit(R5_Wantwrite, &dev->flags);
>           }
> +        if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags),
> +                    "%s: disk%ld not up to date\n",
> +                    mdname(conf->mddev),
> +                    dev - (struct r5dev *) &sh->dev)) {
> +            clear_bit(R5_LOCKED, &dev->flags);
> +            clear_bit(R5_Wantwrite, &dev->flags);
> +            s->locked--;
> +        }
>           clear_bit(STRIPE_DEGRADED, &sh->state);
>
>           set_bit(STRIPE_INSYNC, &sh->state);
> --
> 2.20.1
>
>

Thanks Nigel!

Pushed to https://github.com/liu-song-6/linux/tree/md-next .

Song