Re: What are mdadm maintainers to do? (error recovery redundancy/data loss)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 17 Feb 2015, Chris wrote:

Evererybody please answer with improved versions if you can.

if smartctl tool is available
 if scterc is disabled
   /usr/sbin/smartctl -l scterc,70,70 ${DEVNAME}
 else
   if screrc is not available
     echo 180 >/sys/block/${DEVNAME}/device/timeout

Found an older implementation that "seems to work fine":

Hi,

Generally I like this idea, and I agree that this would be a good idea, but if I was running raid0 or linear, I might not want scterc to be enabled.

Also, what would the harm be to always bump the timeout to 180 seconds? Yes, drives would take longer to be kicked out in case of errors, but if we're confident in scterc working, wouldn't we want to turn down the timeout to 10-15 seconds then?

Personally I turn on scterc if available and turn up the timeout to 180 seconds, always, regardless what drives I'm running. I'd rather wait longer for a drive to be considered dead, than to have drives being kicked due to some hiccup in the system (controller or drive reset) that might rectify itself.

So I would suggest turning on scterc and turning up the timeout to 180 seconds as soon as mdadm is installed. This is the best tradeoff I can come up with between stability and fast drive-dead-detection time.

Here on the list I see people all the time coming in with multiple drives kicked due to controller resets and other intermittent flukes, I never see people coming in complaining that it took 30 seconds to detect a drive error. I doubt there'd be much complaint for 180 seconds. If someone needs faster detect times then my opinion is that they are in the category who can be expected to tune this value to their application. 180 seconds works best for the "larger crowd" using mdadm.

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux