On 11/3/19 6:43 am, Adam Goryachev wrote:
On 11/3/19 2:10 am, Wols Lists wrote:
On 10/03/19 11:14, Reindl Harald wrote:
I'd like to modify the raid layer such that it times out quickly, and
recalculates and rewrites the data after a few seconds, such that these
drives cease to be a problem,
I probably know a lot less here, but is this a RAID layer issue? Isn't
it up to root to decide how long the timeout is (below the RAID layer,
eg SATA or SCSI etc layer)? Ideally, you don't WANT the RAID layer to
kick a "slow" disk, we don't know *why* it is slow, and as soon as we
kick it, or make it do more work than it already is, then we risk making
the "slow" problem even worse (ie, losing redundancy).
You can't do anything about the timeout. The *problem* is if you try and
do what you suggest (time out quickly, recaclulate and write) it'll die
painfully which is *why* we run the script to increase the kernel timeouts.
Specifically the issue is the drive will take as long as it takes and
you can't make it go any quicker. While it's doing its thing it is
non-responsive on the bus. So the default timeout of ~30 seconds goes
like this :
Time = 0 - Drive - "Choke on a duff sector, let me try and take ~120
seconds to read it"
Time = 30 - Kernel - "Hello drive, you've taken too long. Let me
re-calculate that data and have you re-write it".
Time = 30 - Drive ....
Time = 30 + not very long - Kernel - "I see the drive is dead. Write
failed. Kick it from the array and pretend we can't see it anymore".
Time = (somewhere more than 30 and less than 180) - Drive - "I'm done.
Didn't get a good read, so have a read error in return"
Time = _same_ - Kernel .....
Dead, gone. Somewhere in there might be a bus reset, but the drive will
ignore that also while it's off into the weeds.
Now turn the kernel timeout past the point of the longest drive retry
and things look different.
Time = 0 - Drive - "Choke on a duff sector, let me try and take ~7
seconds to read it"
Time = (somewhere less than or equal to 7) - Drive - "I'm done. Didn't
get a good read, so have a read error in return"
Time = _same_ - Kernel - "Thanks for that. Here let me give you a sector
to re-write"
Time = _same_ - Drive - "Done, anything else?"
So with SCTERC you turn that indeterminate 180 < X > 30 second pause
into a definite "less than whatever the default, or the time you set"
delay and all those issues go away.
Of course the other thing is while you are waiting on the commodity
drive to do its thing, the array is sitting there doing nothing.
So, no. You can't do what you intend to do. It won't work. Buy decent
drives.
If you do manage to figure out how to make it work (surely you could
power cycle and bus reset a drive in less than 120 seconds) then I'll
buy shares in your storage company.
As for SMART tests. I used to schedule SMART long tests on a 24 drive
array to start 1 hour before the monthly RAID scrub *while* beating them
up with a heavy random read/write load. Things always got glacial but I
never lost a drive, so I suspect there is something else going on.
Since it's reproducible, how about some block traces?
Regards,
Brad