On Thu, Sep 5, 2019 at 6:10 AM Nigel Croxon <ncroxon@xxxxxxxxxx> wrote: > > On 6/20/19 7:31 AM, Nigel Croxon wrote: > > Hello All, > > > > When RAID6 is set up on dm-integrity target that detects massive > > corruption, the leg will be ejected from the array. Even if the issue > > is correctable with a sector re-write and the array has necessary > > redundancy to correct it. > > > > The leg is ejected because it runs up the rdev->read_errors beyond > > conf->max_nr_stripes (600). > > > > The return status in dm-crypt when there is a data integrity error is > > BLK_STS_PROTECTION. > > > > I propose we don't increment the read_errors when the bi->bi_status is > > BLK_STS_PROTECTION. > > > > > > drivers/md/raid5.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index b83bce2beb66..ca73e60e33ed 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -2526,7 +2526,8 @@ static void raid5_end_read_request(struct bio * bi) > > int set_bad = 0; > > > > clear_bit(R5_UPTODATE, &sh->dev[i].flags); > > - atomic_inc(&rdev->read_errors); > > + if (!(bi->bi_status == BLK_STS_PROTECTION)) > > + atomic_inc(&rdev->read_errors); > > if (test_bit(R5_ReadRepl, &sh->dev[i].flags)) > > pr_warn_ratelimited( > > "md/raid:%s: read error on replacement device (sector > > %llu on %s).\n", > > > I'm up against this wall again. We should continue to count errors > returned by the lower layer, > > but if those errors are -EILSEQ, instead of -EIO, MD should not mark the > device as failed. > Sorry for the very late reply. I think the change is on the right direction. Please submit official patch so we can discuss the details. Thanks, Song