On Thu, Mar 28, 2019 at 5:16 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > > On 3/26/19 7:33 PM, Song Liu wrote: > > On Tue, Mar 26, 2019 at 9:08 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > > Changing state from check_state_check_result to > check_state_compute_result not only is unsafe but also doesn't > appear to serve a valid purpose. A raid6 check should only be > pushing out extra writes if doing repair and a mis-match occurs. > The stripe dev management will already try and do repair writes > for failing sectors. > > Could you please explain more details about the failed case before > this patch? > > Thanks, > Song > > Reproduced the issue with a modified scsi_debug module. Redefined OPT_MEDIUM_ERR_ADDR to 12000 to move the I/O error where I wanted in on the disk. I loaded the modified scsi_debug with parameter dev_size_mb=250 to make it bigger and partitioned it into partitions to make a test raid6 device out of. > > > Device Boot Start End Blocks Id System > /dev/sde1 2048 94207 46080 83 Linux > /dev/sde2 94208 186367 46080 83 Linux > /dev/sde3 186368 278527 46080 83 Linux > /dev/sde4 278528 511999 116736 5 Extended > /dev/sde5 280576 372735 46080 83 Linux > /dev/sde6 374784 466943 46080 83 Linux > > The raid6 device was created with: > > mdadm --create /dev/md127 --level=6 --raid-devices=5 /dev/sde1 /dev/sde2 /dev/sde3 /dev/sde5 /dev/sde6 > > After initially running, a check can be triggered and it work without issue. The medium error feature of scsi_debug can then be turned on with: > > echo "2" >/sys/module/scsi_debug/parameters/opts > > > The sector 12000, which is part of sde1, will now fail to read and contains the q parity information for its stripe of the raid6 array. Running a new check operation crashes the system at the same BUG_ON as seen by the customer. > > This patch makes the raid6 check_state_check_result handling > work more like raid5's. If somehow too many failures for a > check, just quit the check operation for the stripe. When any > checks pass, don't try and use check_state_compute_result for > a purpose it isn't needed for and is unsafe for. Just mark the > stripe as in sync for passing its parity checks and let the > stripe dev read/write code and the bad blocks list do their > job handling I/O errors. + Dan Williams. But I am not sure whether Dan still remembers this work. :) Dan, please feel free to ignore this. :) Thanks Nigel and Xiao for these details. I will process the patch. Regards, Song > > OriginalAuthor: David Jeffery <djeffery@xxxxxxxxxx> > Tested-by: David Jeffery <djeffery@xxxxxxxxxx> > Signed-off-by: David Jeffy <djeffery@xxxxxxxxxx> > Signed-off-by: Nigel Croxon <ncroxon@xxxxxxxxxx> > --- > drivers/md/raid5.c | 19 ++++--------------- > 1 file changed, 4 insertions(+), 15 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index c033bfcb209e..4204a972ecf2 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -4223,26 +4223,15 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh, > case check_state_check_result: > sh->check_state = check_state_idle; > > + if (s->failed > 1) > + break; > /* handle a successful check operation, if parity is correct > * we are done. Otherwise update the mismatch count and repair > * parity if !MD_RECOVERY_CHECK > */ > if (sh->ops.zero_sum_result == 0) { > - /* both parities are correct */ > - if (!s->failed) > - set_bit(STRIPE_INSYNC, &sh->state); > - else { > - /* in contrast to the raid5 case we can validate > - * parity, but still have a failure to write > - * back > - */ > - sh->check_state = check_state_compute_result; > - /* Returning at this point means that we may go > - * off and bring p and/or q uptodate again so > - * we make sure to check zero_sum_result again > - * to verify if p or q need writeback > - */ > - } > + /* Any parity checked was correct */ > + set_bit(STRIPE_INSYNC, &sh->state); > } else { > atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches); > if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) { > -- > 2.20.1 >