On Tue, Apr 16, 2019 at 2:31 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > > > On 4/16/19 5:18 PM, Song Liu wrote: > > On Tue, Apr 16, 2019 at 1:24 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >> > >> On 4/16/19 1:40 PM, Song Liu wrote: > >>> On Mon, Apr 15, 2019 at 1:53 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >>>> > >>>> On 04/15/2019 03:24 PM, Song Liu wrote: > >>>>> On Mon, Apr 15, 2019 at 10:54 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >>>>>> On 04/15/2019 01:44 PM, Song Liu wrote: > >>>>>>> Hi Nigel, > >>>>>>> > >>>>>>> On Thu, Apr 11, 2019 at 10:21 AM Song Liu <liu.song.a23@xxxxxxxxx> wrote: > >>>>>>>> On Tue, Apr 9, 2019 at 12:26 PM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > >>>>>>>>> On 4/9/19 2:09 PM, John Stoffel wrote: > >>>>>>>>>>>>>>> "Dan" == Dan Williams <dan.j.williams@xxxxxxxxx> writes: > >>>>>>>>>> Dan> On Mon, Apr 8, 2019 at 4:18 PM Song Liu <liu.song.a23@xxxxxxxxx> wrote: > >>>>>>>>>> Dan> [..] > >>>>>>>>>>>>>> To trigger this issue, you not only need a failed disk but to also > >>>>>>>>>>>>>> perform a scrubbing operation. The customer's systems both crashed > >>>>>>>>>>>>>> early Sunday morning when the raid-check script is run by default from cron. > >>>>>>>>>>>>> Ok, I follow this, but I come to a different answer on the required > >>>>>>>>>>>>> fix. I think it is simply the following to preserve the writeback > >>>>>>>>>>>>> action after the parity check, because we need the failed Q slot to be > >>>>>>>>>>>>> written back if we're recovering. P will be not up-to-date because it > >>>>>>>>>>>>> was checked with the good disks, but sh->ops.zero_sum_result will be > >>>>>>>>>>>>> 0, so that will skip the writeback of a !uptodate P value. > >>>>>>>>>>>>> > >>>>>>>>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>>>>>>>>>>> index c033bfcb209e..e2eb59289346 100644 > >>>>>>>>>>>>> --- a/drivers/md/raid5.c > >>>>>>>>>>>>> +++ b/drivers/md/raid5.c > >>>>>>>>>>>>> @@ -4187,7 +4187,6 @@ static void handle_parity_checks6(struct r5conf > >>>>>>>>>>>>> *conf, struct stripe_head *sh, > >>>>>>>>>>>>> /* now write out any block on a failed drive, > >>>>>>>>>>>>> * or P or Q if they were recomputed > >>>>>>>>>>>>> */ > >>>>>>>>>>>>> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to > >>>>>>>>>>>>> recover */ > >>>>>>>>>>>> Thanks Dan! > >>>>>>>>>>>> > >>>>>>>>>>>> Would it make sense to rework the check as > >>>>>>>>>>>> > >>>>>>>>>>>> BUG_ON(s->uptodate < disks - 2); > >>>>>>>>>> Dan> I think the problem is that any 'uptodate' vs 'disks' check is > >>>>>>>>>> Dan> not precise enough in this path. What might be better is to put > >>>>>>>>>> Dan> "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the devices that > >>>>>>>>>> Dan> might try to kick off writes and then skip the action. Better to > >>>>>>>>>> Dan> prevent the raid driver from taking unexpected action *and* keep > >>>>>>>>>> Dan> the system alive vs killing the machine with BUG_ON. > >>>>>>>>>> > >>>>>>>>>> Dan> BUG_ON has fallen out of favor for exception reporting since > >>>>>>>>>> Dan> those assertions were introduced. > >>>>>>>>>> > >>>>>>>>>> And since it' causes the system to crash... it's super annoying when > >>>>>>>>>> the rest of the system is working fine. Please only use a WARN_ON, > >>>>>>>>>> and maybe even set the RAID volume readonly, etc. But don't bring > >>>>>>>>>> down the rest of the system if possible. > >>>>>>>>>> > >>>>>>>>>> John > >>>>>>>>> I reverted the first patch as it made its way upstream. > >>>>>>>>> > >>>>>>>>> Testing this change now. > >>>>>>>>> > >>>>>>>>> --- > >>>>>>>>> > >>>>>>>>> drivers/md/raid5.c | 2 +- > >>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>>>>>>> index c033bfcb209e..660ca3af2914 100644 > >>>>>>>>> --- a/drivers/md/raid5.c > >>>>>>>>> +++ b/drivers/md/raid5.c > >>>>>>>>> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > >>>>>>>>> *conf, struct stripe_head *sh, > >>>>>>>>> /* now write out any block on a failed drive, > >>>>>>>>> * or P or Q if they were recomputed > >>>>>>>>> */ > >>>>>>>>> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > >>>>>>>>> + WARN_ON(s->uptodate < disks - 2); /* We don't need Q to recover */ > >>>>>>>> I think this WARN_ON() is the best way to go, though the comment "don't need Q" > >>>>>>>> needs some revise. > >>>>>>>> > >>>>>>>> Nigel, how does this work in your tests? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Song > >>>>>>> Could you please share updates about the tests? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Song > >>>>>> Hello Song, > >>>>>> Our customer has confirmed, the issue is fixed. He had run this on two > >>>>>> systems with defective Disks and run raid-check. There were some errors > >>>>>> corrected, the device was degraded, but the kernel never crashed. > >>>>>> > >>>>>> -Nigel > >>>>> Thanks Nigel! > >>>>> > >>>>> Could you please submit official patch? > >>>>> > >>>>> Song > >>>>> > >>>>>>>>> if (s->failed == 2) { > >>>>>>>>> dev = &sh->dev[s->failed_num[1]]; > >>>>>>>>> s->locked++; > >>>>>>>>> -- > >>>>>>>>> 2.20.1 > >>>>>>>>> > >>>> Song, You will need to revert 4f4fd7c5798 before applying this: > >>>> -Nigel > >>>> > >>>> From: Nigel Croxon <ncroxon@xxxxxxxxxx> > >>>> Date: Wed, 10 Apr 2019 09:23:34 -0400 > >>>> Subject: [PATCH ] preserve the writeback action after the parity check > >>>> > >>>> The problem is that any 'uptodate' vs 'disks' check is not precise > >>>> in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the > >>>> device that might try to kick off writes and then skip the action. > >>>> Better to prevent the raid driver from taking unexpected action *and* keep > >>>> the system alive vs killing the machine with BUG_ON. > >>>> > >>>> Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result > >>>> state > >>>> > >>>> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > >>>> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx> > >>>> --- > >>>> drivers/md/raid5.c | 10 +++++++++- > >>>> 1 file changed, 9 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >>>> index f957b7e42562..4f6b32366c00 100644 > >>>> --- a/drivers/md/raid5.c > >>>> +++ b/drivers/md/raid5.c > >>>> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > >>>> *conf, struct stripe_head *sh, > >>>> /* now write out any block on a failed drive, > >>>> * or P or Q if they were recomputed > >>>> */ > >>>> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > >>>> + dev = NULL; > >>>> if (s->failed == 2) { > >>>> dev = &sh->dev[s->failed_num[1]]; > >>>> s->locked++; > >>>> @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf > >>>> *conf, struct stripe_head *sh, > >>>> set_bit(R5_LOCKED, &dev->flags); > >>>> set_bit(R5_Wantwrite, &dev->flags); > >>>> } > >>>> + if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags), > >>>> + "%s: disk%ld not up to date\n", > >>>> + mdname(conf->mddev), > >>>> + dev - (struct r5dev *) &sh->dev)) { > >>>> + clear_bit(R5_LOCKED, &dev->flags); > >>>> + clear_bit(R5_Wantwrite, &dev->flags); > >>>> + s->locked--; > >>>> + } > >>>> clear_bit(STRIPE_DEGRADED, &sh->state); > >>>> > >>>> set_bit(STRIPE_INSYNC, &sh->state); > >>>> -- > >>>> 2.20.1 > >>>> > >>>> > >>> Thanks Nigel! > >>> > >>> Pushed to https://github.com/liu-song-6/linux/tree/md-next . > >>> > >>> Song > >> From: Nigel Croxon <ncroxon@xxxxxxxxxx> > >> Date: Tue, 16 Apr 2019 15:37:30 -0400 > >> Subject: [PATCH] preserve the writeback action after the parity check > >> > >> > >> The problem is that any 'uptodate' vs 'disks' check is not precise > >> in this path. Put a "WARN_ON(!test_bit(R5_UPTODATE, &dev->flags)" on the > >> device that might try to kick off writes and then skip the action. > >> Better to prevent the raid driver from taking unexpected action *and* keep > >> the system alive vs killing the machine with BUG_ON. > >> > >> V2: change format '%ld' argument of type 'long int', to 'int'. > >> > >> Fixes: 4f4fd7c5798 Don't jump to compute_result state from check_result > >> state > >> > >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > >> Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx> > > How about something like this? > > > > https://github.com/liu-song-6/linux/commit/88b4458c52ecbc3e016ec95ef866f19afe4550a0 > > The cast long is fine with me. > > -Nigel Actually, %td should be the best: https://github.com/liu-song-6/linux/commit/b2176a1dfb518d870ee073445d27055fea64dfb8 Thanks, Song > > >> --- > >> drivers/md/raid5.c | 10 +++++++++- > >> 1 file changed, 9 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >> index f957b7e42562..395e7ec0fb6e 100644 > >> --- a/drivers/md/raid5.c > >> +++ b/drivers/md/raid5.c > >> @@ -4187,7 +4187,7 @@ static void handle_parity_checks6(struct r5conf > >> *conf, struct stripe_head *sh, > >> /* now write out any block on a failed drive, > >> * or P or Q if they were recomputed > >> */ > >> - BUG_ON(s->uptodate < disks - 1); /* We don't need Q to recover */ > >> + dev = NULL; > >> if (s->failed == 2) { > >> dev = &sh->dev[s->failed_num[1]]; > >> s->locked++; > >> @@ -4212,6 +4212,14 @@ static void handle_parity_checks6(struct r5conf > >> *conf, struct stripe_head *sh, > >> set_bit(R5_LOCKED, &dev->flags); > >> set_bit(R5_Wantwrite, &dev->flags); > >> } > >> + if (WARN_ONCE(dev && !test_bit(R5_UPTODATE, &dev->flags), > >> + "%s: disk%lx not up to date\n", > >> + mdname(conf->mddev), > >> + dev - (struct r5dev *) &sh->dev)) { > >> + clear_bit(R5_LOCKED, &dev->flags); > >> + clear_bit(R5_Wantwrite, &dev->flags); > >> + s->locked--; > >> + } > >> clear_bit(STRIPE_DEGRADED, &sh->state); > >> > >> set_bit(STRIPE_INSYNC, &sh->state); > >> -- > >> 2.20.1 > >> > >> > >> > >> > >>