Re: smart short test crashes software raid array?

Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx> · Tue, 12 Mar 2019 09:37:38 +1100

On 12/3/19 5:14 am, Wols Lists wrote:
On 11/03/19 12:31, Nix wrote:
On 10 Mar 2019, Wols Lists uttered the following:

I'd like to modify the raid layer such that it times out quickly, and
recalculates and rewrites the data after a few seconds, such that these
drives cease to be a problem, but stick that on the long list of raid
papercuts I'd like to sort out when I can find the time to learn to
program the raid subsystem!
I don't see how that could work. When these drives get stuck on lengthy
retries, they are essentially unresponsive:
So any code needs to take that in to account. Pain in the arse, but when
the linux read times out, the re-write code needs to detect that the
drive is one of these cheapos, and spawn a thread that waits for the
drive time-out before rewriting it.

Of course, that's going to cause a host of other issues that will need
sorting/fixing :-) - the obvious one is what happens if something else
re-writes that block in the middle of the time-out period ...

Cheers,
Wol

Doesn't this happen already? The drive will either return the data (if 
it magically succeeds in reading the requested data in that 180? 
seconds, or it will return a read error. If MD gets the data, it carries 
on normally, (albeit with a delay). If MD gets a read error, it will 
automatically reconstruct the data (assuming a working raid array with 
sufficient redundancy to do that without the data we were trying to 
read), and issue a write to the drive. If the drive fails to write the 
data and returns an error, then the drive is kicked from the array.

AFAIR, the "problem" was that the kernel isn't configured (by default) 
to wait 180s, so it will try to reset the SATA bus, and trigger a failed 
read response to MD, MD will issue the write request, the kernel is 
trying to re-contact the drive and the drive is still busy trying to 
complete the original read request, we get a second timeout, the kernel 
try to reset the SATA bus again and triggers a failed write request to 
MD, which now kicks the drive.

So, as long as root (ie, the administrator) configures the kernel to 
match the installed hardware (or the distribution magically detects and 
configures this on behalf of the administrator) then everything works 
well (ie, no loss of redundancy, no failed RAID arrays) due to a single 
failed read request. Of course, there is still the 180s delay/freeze, 
but that is a "better" overall outcome, and results in a good solution 
for most admins/users.

If it becomes a problem, then the admin can fix (replace) the hardware 
with better options and solve both problems (reducing the "freeze/delay" 
from around 180s to around 7s (btw, a 7s delay could also be 
unacceptable for any number of users/admins, personally, my users insist 
on a 0.1s delay or less for *everything* and anything worse is a major 
incident).

Regards,
Adam

--
Adam Goryachev Website Managers www.websitemanagers.com.au

--
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful. If you have received this message
in error, please notify us immediately. Please also destroy and delete the
message from your computer. Viruses - Any loss/damage incurred by receiving
this email is not the sender's responsibility.