Re: raid6 with dm-integrity should not cause device to fail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/5/19 7:35 AM, Nigel Croxon wrote:
On 6/20/19 7:31 AM, Nigel Croxon wrote:
Hello All,

When RAID6 is set up on dm-integrity target that detects massive corruption, the leg will be ejected from the array.  Even if the issue is correctable with a sector re-write and the array has necessary redundancy to correct it.

The leg is ejected because it runs up the rdev->read_errors beyond conf->max_nr_stripes (600).

The return status in dm-crypt when there is a data integrity error is BLK_STS_PROTECTION.

I propose we don't increment the read_errors when the bi->bi_status is BLK_STS_PROTECTION.


 drivers/md/raid5.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b83bce2beb66..ca73e60e33ed 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2526,7 +2526,8 @@ static void raid5_end_read_request(struct bio * bi)
         int set_bad = 0;

         clear_bit(R5_UPTODATE, &sh->dev[i].flags);
-        atomic_inc(&rdev->read_errors);
+        if (!(bi->bi_status == BLK_STS_PROTECTION))
+            atomic_inc(&rdev->read_errors);
         if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
             pr_warn_ratelimited(
                 "md/raid:%s: read error on replacement device (sector %llu on %s).\n",


I'm up against this wall again.  We should continue to count errors returned by the lower layer,

but if those errors are -EILSEQ, instead of -EIO, MD should not mark the device as failed.


https://securitypitfalls.wordpress.com/2018/05/08/raid-doesnt-work/




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux