On 6/20/19 7:31 AM, Nigel Croxon wrote:
Hello All,
When RAID6 is set up on dm-integrity target that detects massive
corruption, the leg will be ejected from the array. Even if the issue
is correctable with a sector re-write and the array has necessary
redundancy to correct it.
The leg is ejected because it runs up the rdev->read_errors beyond
conf->max_nr_stripes (600).
The return status in dm-crypt when there is a data integrity error is
BLK_STS_PROTECTION.
I propose we don't increment the read_errors when the bi->bi_status is
BLK_STS_PROTECTION.
drivers/md/raid5.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b83bce2beb66..ca73e60e33ed 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2526,7 +2526,8 @@ static void raid5_end_read_request(struct bio * bi)
int set_bad = 0;
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
- atomic_inc(&rdev->read_errors);
+ if (!(bi->bi_status == BLK_STS_PROTECTION))
+ atomic_inc(&rdev->read_errors);
if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
pr_warn_ratelimited(
"md/raid:%s: read error on replacement device (sector
%llu on %s).\n",
I'm up against this wall again. We should continue to count errors
returned by the lower layer,
but if those errors are -EILSEQ, instead of -EIO, MD should not mark the
device as failed.