read error recovery threshold

Eric Mei <meijia@xxxxxxxxx> · Mon, 15 Sep 2014 10:56:11 -0600

Hi,

After a read error detected, RAID6 will initiate a recovery procedure
try to correct it, until the number of read error exceeds a threshold,
which is "conf->max_nr_stripes" (see raid5_end_read_request()), I'm
wondering the reasoning behind this. To me the threshold seems a drive
property, but max_nr_stripes is a array-wide cache setting and can be
changed at runtime. In our specific case, we observed a drive emitting
lots of read errors without being marked as faulty because the larger
max_nr_stripes
setting.

Look at other part of MD code, there is "mddev::max_corr_read_errors"
which is set to 20, but only RAID10 makes use of it. Also the comment
above MD_DEFAULT_MAX_CORRECTED_READ_ERRORS says "...We divide the read
error count by 2 for every hour elapsed between read errors", but I
don't see any code matching this description.

Any thoughts? Thanks

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html