Re: smart short test crashes software raid array?

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Mon, 11 Mar 2019 18:14:00 +0000

On 11/03/19 12:31, Nix wrote:
> On 10 Mar 2019, Wols Lists uttered the following:
> 
>> > I'd like to modify the raid layer such that it times out quickly, and
>> > recalculates and rewrites the data after a few seconds, such that these
>> > drives cease to be a problem, but stick that on the long list of raid
>> > papercuts I'd like to sort out when I can find the time to learn to
>> > program the raid subsystem!

> I don't see how that could work. When these drives get stuck on lengthy
> retries, they are essentially unresponsive: 

So any code needs to take that in to account. Pain in the arse, but when
the linux read times out, the re-write code needs to detect that the
drive is one of these cheapos, and spawn a thread that waits for the
drive time-out before rewriting it.

Of course, that's going to cause a host of other issues that will need
sorting/fixing :-) - the obvious one is what happens if something else
re-writes that block in the middle of the time-out period ...

Cheers,
Wol