On Mon, Apr 15, 2019 at 1:53 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > > > > On 04/15/2019 03:24 PM, Song Liu wrote: > > On Mon, Apr 15, 2019 at 10:54 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >> > >> > >> On 04/15/2019 01:44 PM, Song Liu wrote: > >>> Hi Nigel, > >>> > >>> On Thu, Apr 11, 2019 at 10:21 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote: > >>>> On Tue, Apr 9, 2019 at 12:26 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >>>>> On 4/9/19 2:09 PM, John Stoffel wrote: > >>>>>>>>>>> "Dan" == Dan Williams <dan.j.williams@xxxxxxxxx> writes: > >>>>>> Dan> On Mon, Apr 8, 2019 at 4:18 PM Song Liu <liu.song.a23@xxxxxxxxx> wrote: > >>>>>> Dan> [..] > >>>>>>>>>> To trigger this issue, you not only need a failed disk but to also > >>>>>>>>>> perform a scrubbing operation. The customer's systems both crashed > >>>>>>>>>> early Sunday morning when the raid-check script is run by default from cron. > >>>>>>>>> Ok, I follow this, but I come to a different answer on the required > >>>>>>>>> fix. I think it is simply the following to preserve the writeback > >>>>>>>>> action after the parity check, because we need the failed Q slot to be > >>>>>>>>> written back if we're recovering. P will be not up-to-date because it > >>>>>>>>> was checked with the good disks, but sh->ops.zero_sum_result will be > >>>>>>>>> 0, so that will skip the writeback of a !uptodate P value. > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>>>>>>> index c033bfcb209e..e2eb59289346 100644 > >>>>>>>>> --- a/drivers/md/raid5.c > >>>>>>>>> +++ b/drivers/md/raid5.c > >>>>>>>>> @@ -4187,7 +4187,6 @@ static void handle_parity_checks6(struct r5conf > >>>>>>>>> *conf, struct stripe_head *sh, > >>>>>>>>> /* now write out any block on a failed drive, > >>>>>>>>> * or P or Q if they were recomputed > >>>>>>>>> */ > >>>>>>>>> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to > >>>>>>>>> recover */ > >>>>>>>> Thanks Dan! > >>>>>>>> > >>>>>>>> Would it make sense to rework the check as > >>>>>>>> > >>>>>>>> BUG_ON(s->uptodate < disks - 2); > >>>>>> Dan> I think the problem is that any 'uptodate' vs 'disks' check is > >>>>>> Dan> not precise enough in this path. What might be better is to put > >>>>>> Dan> "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the devices that > >>>>>> Dan> might try to kick off writes and then skip the action. Better to > >>>>>> Dan> prevent the raid driver from taking unexpected action *and* keep > >>>>>> Dan> the system alive vs killing the machine with BUG_ON. > >>>>>> > >>>>>> Dan> BUG_ON has fallen out of favor for exception reporting since > >>>>>> Dan> those assertions were introduced. > >>>>>> > >>>>>> And since it' causes the system to crash... it's super annoying when > >>>>>> the rest of the system is working fine. Please only use a WARN_ON, > >>>>>> and maybe even set the RAID volume readonly, etc. But don't bring > >>>>>> down the rest of the system if possible. > >>>>>> > >>>>>> John > >>>>> I reverted the first patch as it made its way upstream. > >>>>> > >>>>> Testing this change now. > >>>>> > >>>>> --- > >>>>> > >>>>> drivers/md/raid5.c | 2 +- > >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>>> index c033bfcb209e..660ca3af2914 100644 > >>>>> --- a/drivers/md/raid5.c > >>>>> +++ b/drivers/md/raid5.c > >>>>> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > >>>>> *conf, struct stripe_head *sh, > >>>>> /* now write out any block on a failed drive, > >>>>> * or P or Q if they were recomputed > >>>>> */ > >>>>> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > >>>>> + WARN_ON(s->uptodate < disks - 2); /* We don't need Q to recover */ > >>>> I think this WARN_ON() is the best way to go, though the comment "don't need Q" > >>>> needs some revise. > >>>> > >>>> Nigel, how does this work in your tests? > >>>> > >>>> Thanks, > >>>> Song > >>> Could you please share updates about the tests? > >>> > >>> Thanks, > >>> Song > >> Hello Song, > >> Our customer has confirmed, the issue is fixed. He had run this on two > >> systems with defective Disks and run raid-check. There were some errors > >> corrected, the device was degraded, but the kernel never crashed. > >> > >> -Nigel > > Thanks Nigel! > > > > Could you please submit official patch? > > > > Song > > > >>>>> if (s->failed == 2) { > >>>>> dev = &sh->dev[s->failed_num[1]]; > >>>>> s->locked++; > >>>>> -- > >>>>> 2.20.1 > >>>>> > > Song, You will need to revert 4f4fd7c5798 before applying this: > -Nigel > > From: Nigel Croxon <ncroxon@xxxxxxxxxx> > Date: Wed, 10 Apr 2019 09:23:34 -0400 > Subject: [PATCH ] preserve the writeback action after the parity check > > The problem is that any 'uptodate' vs 'disks' check is not precise > in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the > device that might try to kick off writes and then skip the action. > Better to prevent the raid driver from taking unexpected action *and* keep > the system alive vs killing the machine with BUG_ON. > > Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result > state > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx> > --- > drivers/md/raid5.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index f957b7e42562..4f6b32366c00 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > *conf, struct stripe_head *sh, > /* now write out any block on a failed drive, > * or P or Q if they were recomputed > */ > - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > + dev = NULL; > if (s->failed == 2) { > dev = &sh->dev[s->failed_num[1]]; > s->locked++; > @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf > *conf, struct stripe_head *sh, > set_bit(R5_LOCKED, &dev->flags); > set_bit(R5_Wantwrite, &dev->flags); > } > + if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags), > + "%s: disk%ld not up to date\n", > + mdname(conf->mddev), > + dev - (struct r5dev *) &sh->dev)) { > + clear_bit(R5_LOCKED, &dev->flags); > + clear_bit(R5_Wantwrite, &dev->flags); > + s->locked--; > + } > clear_bit(STRIPE_DEGRADED, &sh->state); > > set_bit(STRIPE_INSYNC, &sh->state); > -- > 2.20.1 > > Thanks Nigel! Pushed to https://github.com/liu-song-6/linux/tree/md-next . Song