Re: smart short test crashes software raid array?

Brad Campbell <lists2009@xxxxxxxxxxxxxxx> · Mon, 11 Mar 2019 15:49:28 +0800

On 11/3/19 6:43 am, Adam Goryachev wrote:
On 11/3/19 2:10 am, Wols Lists wrote:
On 10/03/19 11:14, Reindl Harald wrote:

I'd like to modify the raid layer such that it times out quickly, and
recalculates and rewrites the data after a few seconds, such that these
drives cease to be a problem,

I probably know a lot less here, but is this a RAID layer issue? Isn't 
it up to root to decide how long the timeout is (below the RAID layer, 
eg SATA or SCSI etc layer)? Ideally, you don't WANT the RAID layer to 
kick a "slow" disk, we don't know *why* it is slow, and as soon as we 
kick it, or make it do more work than it already is, then we risk making 
the "slow" problem even worse (ie, losing redundancy).

You can't do anything about the timeout. The *problem* is if you try and 
do what you suggest (time out quickly, recaclulate and write) it'll die 
painfully which is *why* we run the script to increase the kernel timeouts.

Specifically the issue is the drive will take as long as it takes and 
you can't make it go any quicker. While it's doing its thing it is 
non-responsive on the bus. So the default timeout of ~30 seconds goes 
like this :

Time = 0 - Drive - "Choke on a duff sector, let me try and take ~120 
seconds to read it"
Time = 30 - Kernel - "Hello drive, you've taken too long. Let me 
re-calculate that data and have you re-write it".
Time = 30 - Drive ....
Time = 30 + not very long - Kernel - "I see the drive is dead. Write 
failed. Kick it from the array and pretend we can't see it anymore".
Time = (somewhere more than 30 and less than 180) - Drive - "I'm done. 
Didn't get a good read, so have a read error in return"
Time = _same_ - Kernel .....

Dead, gone. Somewhere in there might be a bus reset, but the drive will 
ignore that also while it's off into the weeds.

Now turn the kernel timeout past the point of the longest drive retry 
and things look different.

Time = 0 - Drive - "Choke on a duff sector, let me try and take ~7 
seconds to read it"
Time = (somewhere less than or equal to 7) - Drive - "I'm done. Didn't 
get a good read, so have a read error in return"
Time = _same_ - Kernel - "Thanks for that. Here let me give you a sector 
to re-write"
Time = _same_ - Drive - "Done, anything else?"

So with SCTERC you turn that indeterminate 180 < X > 30  second pause 
into a definite "less than whatever the default, or the time you set" 
delay and all those issues go away.

Of course the other thing is while you are waiting on the commodity 
drive to do its thing, the array is sitting there doing nothing.

So, no. You can't do what you intend to do. It won't work. Buy decent 
drives.

If you do manage to figure out how to make it work (surely you could 
power cycle and bus reset a drive in less than 120 seconds) then I'll 
buy shares in your storage company.

As for SMART tests. I used to schedule SMART long tests on a 24 drive 
array to start 1 hour before the monthly RAID scrub *while* beating them 
up with a heavy random read/write load. Things always got glacial but I 
never lost a drive, so I suspect there is something else going on.

Since it's reproducible, how about some block traces?

Regards,
Brad