Re: [PATCH] Don't jump to compute_result state from check_result state

Song Liu <liu.song.a23@xxxxxxxxx> · Tue, 16 Apr 2019 14:35:42 -0700

On Tue, Apr 16, 2019 at 2:31 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
>
>
> On 4/16/19 5:18 PM, Song Liu wrote:
> > On Tue, Apr 16, 2019 at 1:24 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
> >>
> >> On 4/16/19 1:40 PM, Song Liu wrote:
> >>> On Mon, Apr 15, 2019 at 1:53 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 04/15/2019 03:24 PM, Song Liu wrote:
> >>>>> On Mon, Apr 15, 2019 at 10:54 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
> >>>>>> On 04/15/2019 01:44 PM, Song Liu wrote:
> >>>>>>> Hi Nigel,
> >>>>>>>
> >>>>>>> On Thu, Apr 11, 2019 at 10:21 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
> >>>>>>>> On Tue, Apr 9, 2019 at 12:26 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote:
> >>>>>>>>> On 4/9/19 2:09 PM, John Stoffel wrote:
> >>>>>>>>>>>>>>> "Dan" == Dan Williams <dan.j.williams@xxxxxxxxx> writes:
> >>>>>>>>>> Dan> On Mon, Apr 8, 2019 at 4:18 PM Song Liu <liu.song.a23@xxxxxxxxx> wrote:
> >>>>>>>>>> Dan> [..]
> >>>>>>>>>>>>>> To trigger this issue, you not only need a failed disk but to also
> >>>>>>>>>>>>>> perform a scrubbing operation.  The customer's systems both crashed
> >>>>>>>>>>>>>> early Sunday morning when the raid-check script is run by default from cron.
> >>>>>>>>>>>>> Ok, I follow this, but I come to a different answer on the required
> >>>>>>>>>>>>> fix. I think it is simply the following to preserve the writeback
> >>>>>>>>>>>>> action after the parity check, because we need the failed Q slot to be
> >>>>>>>>>>>>> written back if we're recovering. P will be not up-to-date because it
> >>>>>>>>>>>>> was checked with the good disks, but sh->ops.zero_sum_result will be
> >>>>>>>>>>>>> 0, so that will skip the writeback of a !uptodate P value.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >>>>>>>>>>>>> index c033bfcb209e..e2eb59289346 100644
> >>>>>>>>>>>>> --- a/drivers/md/raid5.c
> >>>>>>>>>>>>> +++ b/drivers/md/raid5.c
> >>>>>>>>>>>>> @@ -4187,7 +4187,6 @@ static void handle_parity_checks6(struct r5conf
> >>>>>>>>>>>>> *conf, struct stripe_head *sh,
> >>>>>>>>>>>>>                      /* now write out any block on a failed drive,
> >>>>>>>>>>>>>                       * or P or Q if they were recomputed
> >>>>>>>>>>>>>                       */
> >>>>>>>>>>>>> -               BUG_ON(s->uptodate < disks - 1); /* We don't need Q to
> >>>>>>>>>>>>> recover */
> >>>>>>>>>>>> Thanks Dan!
> >>>>>>>>>>>>
> >>>>>>>>>>>> Would it make sense to rework the check as
> >>>>>>>>>>>>
> >>>>>>>>>>>> BUG_ON(s->uptodate < disks - 2);
> >>>>>>>>>> Dan> I think the problem is that any 'uptodate' vs 'disks' check is
> >>>>>>>>>> Dan> not precise enough in this path. What might be better is to put
> >>>>>>>>>> Dan> "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the devices that
> >>>>>>>>>> Dan> might try to kick off writes and then skip the action. Better to
> >>>>>>>>>> Dan> prevent the raid driver from taking unexpected action *and* keep
> >>>>>>>>>> Dan> the system alive vs killing the machine with BUG_ON.
> >>>>>>>>>>
> >>>>>>>>>> Dan> BUG_ON has fallen out of favor for exception reporting since
> >>>>>>>>>> Dan> those assertions were introduced.
> >>>>>>>>>>
> >>>>>>>>>> And since it' causes the system to crash... it's super annoying when
> >>>>>>>>>> the rest of the system is working fine.  Please only use a WARN_ON,
> >>>>>>>>>> and maybe even set the RAID volume readonly, etc.  But don't bring
> >>>>>>>>>> down the rest of the system if possible.
> >>>>>>>>>>
> >>>>>>>>>> John
> >>>>>>>>> I reverted the first patch as it made its way upstream.
> >>>>>>>>>
> >>>>>>>>> Testing this change now.
> >>>>>>>>>
> >>>>>>>>> ---
> >>>>>>>>>
> >>>>>>>>>       drivers/md/raid5.c | 2 +-
> >>>>>>>>>       1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >>>>>>>>> index c033bfcb209e..660ca3af2914 100644
> >>>>>>>>> --- a/drivers/md/raid5.c
> >>>>>>>>> +++ b/drivers/md/raid5.c
> >>>>>>>>> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
> >>>>>>>>> *conf, struct stripe_head *sh,
> >>>>>>>>>               /* now write out any block on a failed drive,
> >>>>>>>>>                * or P or Q if they were recomputed
> >>>>>>>>>                */
> >>>>>>>>> -        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
> >>>>>>>>> +        WARN_ON(s->uptodate < disks - 2); /* We don't need Q to recover */
> >>>>>>>> I think this WARN_ON() is the best way to go, though the comment "don't need Q"
> >>>>>>>> needs some revise.
> >>>>>>>>
> >>>>>>>> Nigel, how does this work in your tests?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Song
> >>>>>>> Could you please share updates about the tests?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Song
> >>>>>> Hello Song,
> >>>>>> Our customer has confirmed, the issue is fixed. He had run this on two
> >>>>>> systems with defective Disks and run raid-check. There were some errors
> >>>>>> corrected, the device was degraded, but the kernel never crashed.
> >>>>>>
> >>>>>> -Nigel
> >>>>> Thanks Nigel!
> >>>>>
> >>>>> Could you please submit official patch?
> >>>>>
> >>>>> Song
> >>>>>
> >>>>>>>>>               if (s->failed == 2) {
> >>>>>>>>>                   dev = &sh->dev[s->failed_num[1]];
> >>>>>>>>>                   s->locked++;
> >>>>>>>>> --
> >>>>>>>>> 2.20.1
> >>>>>>>>>
> >>>> Song,   You will need to revert 4f4fd7c5798 before applying this:
> >>>> -Nigel
> >>>>
> >>>> From: Nigel Croxon <ncroxon@xxxxxxxxxx>
> >>>> Date: Wed, 10 Apr 2019 09:23:34 -0400
> >>>> Subject: [PATCH ] preserve the writeback action after the parity check
> >>>>
> >>>> The problem is that any 'uptodate' vs 'disks' check is not precise
> >>>> in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
> >>>> device that might try to kick off writes and then skip the action.
> >>>> Better to prevent the raid driver from taking unexpected action *and* keep
> >>>> the system alive vs killing the machine with BUG_ON.
> >>>>
> >>>> Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result
> >>>> state
> >>>>
> >>>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> >>>> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
> >>>> ---
> >>>>     drivers/md/raid5.c | 10 +++++++++-
> >>>>     1 file changed, 9 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >>>> index f957b7e42562..4f6b32366c00 100644
> >>>> --- a/drivers/md/raid5.c
> >>>> +++ b/drivers/md/raid5.c
> >>>> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
> >>>> *conf, struct stripe_head *sh,
> >>>>             /* now write out any block on a failed drive,
> >>>>              * or P or Q if they were recomputed
> >>>>              */
> >>>> -        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
> >>>> +        dev = NULL;
> >>>>             if (s->failed == 2) {
> >>>>                 dev = &sh->dev[s->failed_num[1]];
> >>>>                 s->locked++;
> >>>> @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf
> >>>> *conf, struct stripe_head *sh,
> >>>>                 set_bit(R5_LOCKED, &dev->flags);
> >>>>                 set_bit(R5_Wantwrite, &dev->flags);
> >>>>             }
> >>>> +        if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags),
> >>>> +                    "%s: disk%ld not up to date\n",
> >>>> +                    mdname(conf->mddev),
> >>>> +                    dev - (struct r5dev *) &sh->dev)) {
> >>>> +            clear_bit(R5_LOCKED, &dev->flags);
> >>>> +            clear_bit(R5_Wantwrite, &dev->flags);
> >>>> +            s->locked--;
> >>>> +        }
> >>>>             clear_bit(STRIPE_DEGRADED, &sh->state);
> >>>>
> >>>>             set_bit(STRIPE_INSYNC, &sh->state);
> >>>> --
> >>>> 2.20.1
> >>>>
> >>>>
> >>> Thanks Nigel!
> >>>
> >>> Pushed to https://github.com/liu-song-6/linux/tree/md-next .
> >>>
> >>> Song
> >> From: Nigel Croxon <ncroxon@xxxxxxxxxx>
> >> Date: Tue, 16 Apr 2019 15:37:30 -0400
> >> Subject: [PATCH] preserve the writeback action after the parity check
> >>
> >>
> >> The problem is that any 'uptodate' vs 'disks' check is not precise
> >> in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the
> >> device that might try to kick off writes and then skip the action.
> >> Better to prevent the raid driver from taking unexpected action *and* keep
> >> the system alive vs killing the machine with BUG_ON.
> >>
> >> V2: change format '%ld' argument of type 'long int', to 'int'.
> >>
> >> Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result
> >> state
> >>
> >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> >> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx>
> > How about something like this?
> >
> > https://github.com/liu-song-6/linux/commit/88b4458c52ecbc3e016ec95ef866f19afe4550a0
>
> The cast long is fine with me.
>
> -Nigel

Actually, %td should be the best:

https://github.com/liu-song-6/linux/commit/b2176a1dfb518d870ee073445d27055fea64dfb8

Thanks,
Song

>
> >> ---
> >>    drivers/md/raid5.c | 10 +++++++++-
> >>    1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index f957b7e42562..395e7ec0fb6e 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf
> >> *conf, struct stripe_head *sh,
> >>            /* now write out any block on a failed drive,
> >>             * or P or Q if they were recomputed
> >>             */
> >> -        BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */
> >> +        dev = NULL;
> >>            if (s->failed == 2) {
> >>                dev = &sh->dev[s->failed_num[1]];
> >>                s->locked++;
> >> @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf
> >> *conf, struct stripe_head *sh,
> >>                set_bit(R5_LOCKED, &dev->flags);
> >>                set_bit(R5_Wantwrite, &dev->flags);
> >>            }
> >> +        if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags),
> >> +            "%s: disk%lx not up to date\n",
> >> +            mdname(conf->mddev),
> >> +            dev - (struct r5dev *) &sh->dev)) {
> >> +            clear_bit(R5_LOCKED, &dev->flags);
> >> +            clear_bit(R5_Wantwrite, &dev->flags);
> >> +            s->locked--;
> >> +        }
> >>            clear_bit(STRIPE_DEGRADED, &sh->state);
> >>
> >>            set_bit(STRIPE_INSYNC, &sh->state);
> >> --
> >> 2.20.1
> >>
> >>
> >>
> >>
> >>