On 05/11/19 22:46, Nigel Croxon wrote: > > On 11/4/19 7:33 PM, Wols Lists wrote: >> On 04/11/19 20:01, Nigel Croxon wrote: >>> The MD driver for level-456 should prevent re-reading read errors. >>> >>> For redundant raid it makes no sense to retry the operation: >>> When one of the disks in the array hits a read error, that will >>> cause a stall for the reading process: >>> - either the read succeeds (e.g. after 4 seconds the HDD error >>> strategy could read the sector) >>> - or it fails after HDD imposed timeout (w/TLER, e.g. after 7 >>> seconds (might be even longer) >> Okay, I'm being completely naive here, but what is going on? Are you >> saying that if we hit a read error, we just carry on, ignore it, and >> calculate the missing block from parity? >> >> If so, what happens if we hit two errors on a raid-5, or 3 on a raid-6, >> or whatever ... :-) >> >> Cheers, >> Wol > > This allows the device (disk) to fail faster. All logic is the same. > > If there is a read error, it does not retry that read, it calculates > > the data from the other disks. This patch removes the retry. > Ummm ... I suspect there is a very good reason for the retry ... Bear in mind I don't actually KNOW anything, you'll need to check with someone who knows about these things, but I get the impression that transient errors aren't that uncommon. It fails, you try again, it succeeds. So if you're going to go down that route, by all means re-calculate from parity if ONE read fails, but if you get more failures such that the raid fails, you need to retry those reads because there is a good chance they will succeed second time round. Cheers, Wol