Re: smart short test crashes software raid array?

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Tue, 12 Mar 2019 09:02:15 +0000

On 12/03/19 08:23, Adam Goryachev wrote:
>>> Doesn't this happen already? The drive will either return the data (if
>>> it magically succeeds in reading the requested data in that 180?
>>> seconds, or it will return a read error.

>> But that's the whole point - THAT IS UNACCEPTABLE.
>>
>> What I would like to make happen is that
>>
>> 1) Linux issues a read request ...
>>
>> we have a read error so
>>
>> 2) Linux times out after 7 seconds
>>
>> 3) The raid code computes the missing block and passes it back to the
>> user
>>
>> 4) The raid code spots that the disk has a 180 timeout *so it waits*
>>
>> 5) The block is rewritten.
>>
>> You're missing the point that that 180s wait really f***s things up for
>> people, and/or they don't realise that there's a problem until they
>> hit it.
>>
>> My solution is a very good fix apart from the fact that step 4 is a pile
>> of spaghetti waiting to cause havoc ... :-)
> 
> OK, now I think I understand your intention better, and yes, it probably
> is a better solution, as long as all the edge cases could be solved. I
> suspect it is a lot more tricky than it would first appear.
> 
> Other things to consider include what to do with the writes intended for
> this disk while the disk is busy... potentially using the out of sync
> bitmap is useful here....
> 
> In fact, why not just eject the drive, and then when it eventually comes
> back, let udev "re-add" the drive, and let the bitmap get it uptodate ?
> 
> Then again, do you really want udev to auto-add a drive that has "failed"?
> 
Depends ... given that this is a typical failure case ... but given that
all too often we have a luser in control of a drive that will destroy
their entire on-line persona if it fails, do we really want to paper
over the common case and hide the disaster ... yes I do know where this
is going.

That said, you've given me a very good idea. How's this for the ideal
scenario?

Linux issues a read.

1a) We have an SCT/ERC drive, the drive times out after 7s, the current
path is followed ...

1b) We have a desktop drive, linux times out after 10s, the stripe is
re-computed.

2) Doesn't raid lock the stripe before writing it back? raid takes a
lock, takes a 180s timeout, and writes the stripe. The current path is
*almost* followed - a fail will kick the drive

3) raid re-reads the stripe, checks the time taken, and if it's outside
7s, spews "imminent disk fail" warnings.

That then deals with all these "writing new stuff during the timeout"
problems, it copes with all these transient problems - I get the
impression that a lot of these read-errors are actually firmware
glitches, and it still detects a failing recording medium because the
re-read is actively looking for trouble.

Still doesn't fix the problem of the luser not reading their logs,
though :-)

Cheers,
Wol