On Tue, Apr 16, 2019 at 1:24 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > > > On 4/16/19 1:40 PM, Song Liu wrote: > > On Mon, Apr 15, 2019 at 1:53 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >> > >> > >> On 04/15/2019 03:24 PM, Song Liu wrote: > >>> On Mon, Apr 15, 2019 at 10:54 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >>>> > >>>> On 04/15/2019 01:44 PM, Song Liu wrote: > >>>>> Hi Nigel, > >>>>> > >>>>> On Thu, Apr 11, 2019 at 10:21 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote: > >>>>>> On Tue, Apr 9, 2019 at 12:26 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >>>>>>> On 4/9/19 2:09 PM, John Stoffel wrote: > >>>>>>>>>>>>> "Dan" == Dan Williams <dan.j.williams@xxxxxxxxx> writes: > >>>>>>>> Dan> On Mon, Apr 8, 2019 at 4:18 PM Song Liu <liu.song.a23@xxxxxxxxx> wrote: > >>>>>>>> Dan> [..] > >>>>>>>>>>>> To trigger this issue, you not only need a failed disk but to also > >>>>>>>>>>>> perform a scrubbing operation. The customer's systems both crashed > >>>>>>>>>>>> early Sunday morning when the raid-check script is run by default from cron. > >>>>>>>>>>> Ok, I follow this, but I come to a different answer on the required > >>>>>>>>>>> fix. I think it is simply the following to preserve the writeback > >>>>>>>>>>> action after the parity check, because we need the failed Q slot to be > >>>>>>>>>>> written back if we're recovering. P will be not up-to-date because it > >>>>>>>>>>> was checked with the good disks, but sh->ops.zero_sum_result will be > >>>>>>>>>>> 0, so that will skip the writeback of a !uptodate P value. > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>>>>>>>>> index c033bfcb209e..e2eb59289346 100644 > >>>>>>>>>>> --- a/drivers/md/raid5.c > >>>>>>>>>>> +++ b/drivers/md/raid5.c > >>>>>>>>>>> @@ -4187,7 +4187,6 @@ static void handle_parity_checks6(struct r5conf > >>>>>>>>>>> *conf, struct stripe_head *sh, > >>>>>>>>>>> /* now write out any block on a failed drive, > >>>>>>>>>>> * or P or Q if they were recomputed > >>>>>>>>>>> */ > >>>>>>>>>>> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to > >>>>>>>>>>> recover */ > >>>>>>>>>> Thanks Dan! > >>>>>>>>>> > >>>>>>>>>> Would it make sense to rework the check as > >>>>>>>>>> > >>>>>>>>>> BUG_ON(s->uptodate < disks - 2); > >>>>>>>> Dan> I think the problem is that any 'uptodate' vs 'disks' check is > >>>>>>>> Dan> not precise enough in this path. What might be better is to put > >>>>>>>> Dan> "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the devices that > >>>>>>>> Dan> might try to kick off writes and then skip the action. Better to > >>>>>>>> Dan> prevent the raid driver from taking unexpected action *and* keep > >>>>>>>> Dan> the system alive vs killing the machine with BUG_ON. > >>>>>>>> > >>>>>>>> Dan> BUG_ON has fallen out of favor for exception reporting since > >>>>>>>> Dan> those assertions were introduced. > >>>>>>>> > >>>>>>>> And since it' causes the system to crash... it's super annoying when > >>>>>>>> the rest of the system is working fine. Please only use a WARN_ON, > >>>>>>>> and maybe even set the RAID volume readonly, etc. But don't bring > >>>>>>>> down the rest of the system if possible. > >>>>>>>> > >>>>>>>> John > >>>>>>> I reverted the first patch as it made its way upstream. > >>>>>>> > >>>>>>> Testing this change now. > >>>>>>> > >>>>>>> --- > >>>>>>> > >>>>>>> drivers/md/raid5.c | 2 +- > >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>>> > >>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>>>>> index c033bfcb209e..660ca3af2914 100644 > >>>>>>> --- a/drivers/md/raid5.c > >>>>>>> +++ b/drivers/md/raid5.c > >>>>>>> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > >>>>>>> *conf, struct stripe_head *sh, > >>>>>>> /* now write out any block on a failed drive, > >>>>>>> * or P or Q if they were recomputed > >>>>>>> */ > >>>>>>> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > >>>>>>> + WARN_ON(s->uptodate < disks - 2); /* We don't need Q to recover */ > >>>>>> I think this WARN_ON() is the best way to go, though the comment "don't need Q" > >>>>>> needs some revise. > >>>>>> > >>>>>> Nigel, how does this work in your tests? > >>>>>> > >>>>>> Thanks, > >>>>>> Song > >>>>> Could you please share updates about the tests? > >>>>> > >>>>> Thanks, > >>>>> Song > >>>> Hello Song, > >>>> Our customer has confirmed, the issue is fixed. He had run this on two > >>>> systems with defective Disks and run raid-check. There were some errors > >>>> corrected, the device was degraded, but the kernel never crashed. > >>>> > >>>> -Nigel > >>> Thanks Nigel! > >>> > >>> Could you please submit official patch? > >>> > >>> Song > >>> > >>>>>>> if (s->failed == 2) { > >>>>>>> dev = &sh->dev[s->failed_num[1]]; > >>>>>>> s->locked++; > >>>>>>> -- > >>>>>>> 2.20.1 > >>>>>>> > >> Song, You will need to revert 4f4fd7c5798 before applying this: > >> -Nigel > >> > >> From: Nigel Croxon <ncroxon@xxxxxxxxxx> > >> Date: Wed, 10 Apr 2019 09:23:34 -0400 > >> Subject: [PATCH ] preserve the writeback action after the parity check > >> > >> The problem is that any 'uptodate' vs 'disks' check is not precise > >> in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the > >> device that might try to kick off writes and then skip the action. > >> Better to prevent the raid driver from taking unexpected action *and* keep > >> the system alive vs killing the machine with BUG_ON. > >> > >> Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result > >> state > >> > >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > >> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx> > >> --- > >> drivers/md/raid5.c | 10 +++++++++- > >> 1 file changed, 9 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >> index f957b7e42562..4f6b32366c00 100644 > >> --- a/drivers/md/raid5.c > >> +++ b/drivers/md/raid5.c > >> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > >> *conf, struct stripe_head *sh, > >> /* now write out any block on a failed drive, > >> * or P or Q if they were recomputed > >> */ > >> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > >> + dev = NULL; > >> if (s->failed == 2) { > >> dev = &sh->dev[s->failed_num[1]]; > >> s->locked++; > >> @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf > >> *conf, struct stripe_head *sh, > >> set_bit(R5_LOCKED, &dev->flags); > >> set_bit(R5_Wantwrite, &dev->flags); > >> } > >> + if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags), > >> + "%s: disk%ld not up to date\n", > >> + mdname(conf->mddev), > >> + dev - (struct r5dev *) &sh->dev)) { > >> + clear_bit(R5_LOCKED, &dev->flags); > >> + clear_bit(R5_Wantwrite, &dev->flags); > >> + s->locked--; > >> + } > >> clear_bit(STRIPE_DEGRADED, &sh->state); > >> > >> set_bit(STRIPE_INSYNC, &sh->state); > >> -- > >> 2.20.1 > >> > >> > > Thanks Nigel! > > > > Pushed to https://github.com/liu-song-6/linux/tree/md-next . > > > > Song > > From: Nigel Croxon <ncroxon@xxxxxxxxxx> > Date: Tue, 16 Apr 2019 15:37:30 -0400 > Subject: [PATCH] preserve the writeback action after the parity check > > > The problem is that any 'uptodate' vs 'disks' check is not precise > in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the > device that might try to kick off writes and then skip the action. > Better to prevent the raid driver from taking unexpected action *and* keep > the system alive vs killing the machine with BUG_ON. > > V2: change format '%ld' argument of type 'long int', to 'int'. > > Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result > state > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx> How about something like this? https://github.com/liu-song-6/linux/commit/88b4458c52ecbc3e016ec95ef866f19afe4550a0 > --- > drivers/md/raid5.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index f957b7e42562..395e7ec0fb6e 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > *conf, struct stripe_head *sh, > /* now write out any block on a failed drive, > * or P or Q if they were recomputed > */ > - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > + dev = NULL; > if (s->failed == 2) { > dev = &sh->dev[s->failed_num[1]]; > s->locked++; > @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf > *conf, struct stripe_head *sh, > set_bit(R5_LOCKED, &dev->flags); > set_bit(R5_Wantwrite, &dev->flags); > } > + if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags), > + "%s: disk%lx not up to date\n", > + mdname(conf->mddev), > + dev - (struct r5dev *) &sh->dev)) { > + clear_bit(R5_LOCKED, &dev->flags); > + clear_bit(R5_Wantwrite, &dev->flags); > + s->locked--; > + } > clear_bit(STRIPE_DEGRADED, &sh->state); > > set_bit(STRIPE_INSYNC, &sh->state); > -- > 2.20.1 > > > > >