On Tue, 2014-01-14 at 08:14 -0500, Phil Turmel wrote: > ?. What did "smartctl -l scterc" say? If it says unsupported, you have > a problem. The workaround is to set the driver timeouts to ~180 seconds > for each such drive. > > If scterc is supported, but disabled, you can set 7-second timeouts with > "smartctl -l scterc,70,70", but you must do so on every power cycle. > Either way, you need boot-time scripting or distro support. > > Raid-rated drives power up with a reasonable setting here. > > Many people discover the timeout problem the first time they have an > otherwise correctable read error in their array, and the array falls > apart instead. This list's archives are well-populated with such cases. Snipped for brevity above. I understand the issue of "timeout" on drives that might perform long error checking which then causes mdadm, via the device (block?) driver issuing a time out, to then kick the drive. In this instance you allow some time for a drive to try and fix things at the expense of a hung array for a longer period of time. I also understand that with scterc the drive gives up (in effect timing its self out) when it hits the 7 second, or there about, mark and subsequently mdadm kicks the drive out. In this specific instance the idea is to kill a drive quickly to that the raid doesn't hang longer than a few seconds. However surely these things (bar the amount of time) result in the same final result of a drive being kicked out. Even in a non-madam hardware raid set up, the drive is either kicked because it didn't return in 7 seconds, or the drive kicks its self because it gave up before 7 seconds. If anything surely when you have a degraded array that will fail if any more disks are kicked then you actually need to do the reverse of normal raid wisdom... which is set the time out in the device (block) layer to as long as possible and then if the drives have scterc enabled then disable it (assuming the drive physically allows it and if disabled performs a harder, or any, internal retry/crc/etc.) to force the drives to give their all to get any, as yet unknown, potential failing sectors back should they occur during a re-build of a failed drive. Surely, unless I'm missing something, rebuilding a failed drive's data means that you want the system to not kick if at all possible and having scterc enabled or a short timeout (shorter than the drives max time, unless that time is indefinite retry) is the last thing you want? > > Regards, > > Phil Jon -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html