Re: raid6 with dm-integrity should not cause device to fail

Robin Hill <robin@xxxxxxxxxxxxxxx> · Thu, 5 Sep 2019 17:05:12 +0100

On Thu Sep 05, 2019 at 11:29:07AM -0400, Nigel Croxon wrote:

> On 9/5/19 7:35 AM, Nigel Croxon wrote:
> > On 6/20/19 7:31 AM, Nigel Croxon wrote:
> >> Hello All,
> >>
> >> When RAID6 is set up on dm-integrity target that detects massive 
> >> corruption, the leg will be ejected from the array.  Even if the 
> >> issue is correctable with a sector re-write and the array has 
> >> necessary redundancy to correct it.
> >>
> >> The leg is ejected because it runs up the rdev->read_errors beyond 
> >> conf->max_nr_stripes (600).
> >>
> >> The return status in dm-crypt when there is a data integrity error is 
> >> BLK_STS_PROTECTION.
> >>
> >> I propose we don't increment the read_errors when the bi->bi_status 
> >> is BLK_STS_PROTECTION.
> >>
> >>
> >>  drivers/md/raid5.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> index b83bce2beb66..ca73e60e33ed 100644
> >> --- a/drivers/md/raid5.c
> >> +++ b/drivers/md/raid5.c
> >> @@ -2526,7 +2526,8 @@ static void raid5_end_read_request(struct bio * 
> >> bi)
> >>          int set_bad = 0;
> >>
> >>          clear_bit(R5_UPTODATE, &sh->dev[i].flags);
> >> -        atomic_inc(&rdev->read_errors);
> >> +        if (!(bi->bi_status == BLK_STS_PROTECTION))
> >> +            atomic_inc(&rdev->read_errors);
> >>          if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
> >>              pr_warn_ratelimited(
> >>                  "md/raid:%s: read error on replacement device 
> >> (sector %llu on %s).\n",
> >
> >
> > I'm up against this wall again.  We should continue to count errors 
> > returned by the lower layer,
> >
> > but if those errors are -EILSEQ, instead of -EIO, MD should not mark 
> > the device as failed.
> >
> >
> https://securitypitfalls.wordpress.com/2018/05/08/raid-doesnt-work/
> 

I'm not clear what you (or the author of the article) are expecting
here. You've got a disk (or disks) with thousands of read errors -
whether these are dm-integrity mismatches or disk-level read errors
doesn't matter - the disk is toast and needs replacing ASAP (or it's in
an environment where it - and you - probably shouldn't be).

Admittedly, with dm-integrity we can probably trust that anything read
from the disk which makes it past the integrity check is valid, so there
may be cases where the data on there is needed to complete a stripe.
That seems a rather theoretical and contrived circumstance though - in
most cases you're better just kicking the drive from the array so the
admin knows that it needs replacing.

In the more normal case, dm-integrity will return occasional read errors
due to cosmic rays, bitrot, or whatever else, and md will rewrite the
blocks with the correct data - everything keeps running and everyone's
happy. And when disks do fail, dm-integrity will give you read errors on
the faulty disk instead of possibly returning the wrong data, so you
don't later end up with data mismatches and have to try and figure out
which drive has caused it.

Cheers,
    Robin

-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |