Re: smart short test crashes software raid array?

Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> · Tue, 12 Mar 2019 19:23:52 +1100

On 12/3/19 19:09, Wols Lists wrote:
On 11/03/19 22:37, Adam Goryachev wrote:
On 12/3/19 5:14 am, Wols Lists wrote:
On 11/03/19 12:31, Nix wrote:
On 10 Mar 2019, Wols Lists uttered the following:

I'd like to modify the raid layer such that it times out quickly, and
recalculates and rewrites the data after a few seconds, such that
these
drives cease to be a problem, but stick that on the long list of raid
papercuts I'd like to sort out when I can find the time to learn to
program the raid subsystem!
I don't see how that could work. When these drives get stuck on lengthy
retries, they are essentially unresponsive:
So any code needs to take that in to account. Pain in the arse, but when
the linux read times out, the re-write code needs to detect that the
drive is one of these cheapos, and spawn a thread that waits for the
drive time-out before rewriting it.

Of course, that's going to cause a host of other issues that will need
sorting/fixing :-) - the obvious one is what happens if something else
re-writes that block in the middle of the time-out period ...

Cheers,
Wol
Doesn't this happen already? The drive will either return the data (if
it magically succeeds in reading the requested data in that 180?
seconds, or it will return a read error.
But that's the whole point - THAT IS UNACCEPTABLE.

What I would like to make happen is that

1) Linux issues a read request ...

we have a read error so

2) Linux times out after 7 seconds

3) The raid code computes the missing block and passes it back to the user

4) The raid code spots that the disk has a 180 timeout *so it waits*

5) The block is rewritten.

You're missing the point that that 180s wait really f***s things up for
people, and/or they don't realise that there's a problem until they hit it.

My solution is a very good fix apart from the fact that step 4 is a pile
of spaghetti waiting to cause havoc ... :-)

OK, now I think I understand your intention better, and yes, it probably 
is a better solution, as long as all the edge cases could be solved. I 
suspect it is a lot more tricky than it would first appear.

Other things to consider include what to do with the writes intended for 
this disk while the disk is busy... potentially using the out of sync 
bitmap is useful here....

In fact, why not just eject the drive, and then when it eventually comes 
back, let udev "re-add" the drive, and let the bitmap get it uptodate ?

Then again, do you really want udev to auto-add a drive that has "failed"?

Regards,
Adam