Re: [PATCH] raid456: avoid second retry of read-error

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Thu, 7 Nov 2019 13:35:39 +0000

On 07/11/19 13:17, Xiao Ni wrote:
> 
> 
> On 11/05/2019 08:33 AM, Wols Lists wrote:
>> On 04/11/19 20:01, Nigel Croxon wrote:
>>> The MD driver for level-456 should prevent re-reading read errors.
>>>
>>> For redundant raid it makes no sense to retry the operation:
>>> When one of the disks in the array hits a read error, that will
>>> cause a stall for the reading process:
>>> - either the read succeeds (e.g. after 4 seconds the HDD error
>>> strategy could read the sector)
>>> - or it fails after HDD imposed timeout (w/TLER, e.g. after 7
>>> seconds (might be even longer)
>> Okay, I'm being completely naive here, but what is going on? Are you
>> saying that if we hit a read error, we just carry on, ignore it, and
>> calculate the missing block from parity?
>>
>> If so, what happens if we hit two errors on a raid-5, or 3 on a raid-6,
>> or whatever ... :-)
>>
> Hi Wol
> 
> What's the meaning of "two errors on a raid-5"? Two read errors happen
> on one disk?
> Or there are two read errors on two disks?
> 
Two read errors on two disks, so that you can't recalculate from parity.

Basically, what I was thinking was "does this patch mean that if we get
a read error, we read the parity instead and recalculate the block that
failed?". If that is the case, what happens if we get a second read
error and can't recalculate?

Because, aiu real-world behaviour, it's quite normal for the first read
to fail and the retry to succeed. So if this patch does what I think
(feel free to tell me I'm wrong :-) a double read error would make raid
return a read error, when actually the old code would have resulted in
the read being successful, if a bit slower.

Cheers,
Wol