Re: [PATCH] raid456: avoid second retry of read-error

Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx> · Wed, 6 Nov 2019 17:02:48 +0100

On 11/5/19 10:11 PM, Song Liu wrote:
On Mon, Nov 4, 2019 at 4:33 PM Wols Lists <antlists@xxxxxxxxxxxxxxx> wrote:
On 04/11/19 20:01, Nigel Croxon wrote:
The MD driver for level-456 should prevent re-reading read errors.

For redundant raid it makes no sense to retry the operation:
When one of the disks in the array hits a read error, that will
cause a stall for the reading process:
- either the read succeeds (e.g. after 4 seconds the HDD error
strategy could read the sector)
- or it fails after HDD imposed timeout (w/TLER, e.g. after 7
seconds (might be even longer)
Okay, I'm being completely naive here, but what is going on? Are you
saying that if we hit a read error, we just carry on, ignore it, and
calculate the missing block from parity?

If so, what happens if we hit two errors on a raid-5, or 3 on a raid-6,
or whatever ... :-)
Based on my understanding (no data on this), the drive will retry read
internally before return error. Therefore, host level retry doesn't really
help. But I could be wrong.

The read which bypasses the cache could retry too, should it be
changed as well?

Thanks,
Guoqing