Hi Mike, On 04/10/2013 11:26 AM, Mike VanHorn wrote: > For some reason, my replies to the linux-raid list aren't going > through, and not all of the messages from the list seem to be > getting to me, either, so I hope it is okay that I am replying > to you directly. It's ok, but I am adding the list back. > Also, Microsoft's mail server from whence my message was > originating has been blacklisted on your server, so I am > sending this to you from my personal account on Yahoo!. You really need to fix your server, then, or just use this yahoo account for linux-raid. My server just uses standard SPF validation and common dns blacklists. > In your reply, you said > >> I recommend: >> >> 1) Fix timeouts as needed. Either set your drives' ERC to 7.0 >> seconds, or raise the driver timeouts ~180 seconds. > > As it turns out, the drives in question aren't ERC capable: > > # smartctl -l scterc,70,70 /dev/sdc > smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.18-308.13.1.el5] (local > build) > Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net > <http://smartmontools.sourceforge.net/> > > Warning: device does not support SCT Error Recovery Control command > # > > However, when I do the following > > for x in /sys/block/sd[cdfghij] ; do echo $x: $(< $x/device/timeout) ; > done>timeout.txt > > I get output such as > > /sys/block/sdj: 180 > > because it seems that I've previously discovered that they aren't ERC capable, as I'm setting the timeout in /etc/rc.local like so: > > echo 180 >/sys/block/sdc/device/timeout > echo 180 >/sys/block/sdd/device/timeout > echo 180 >/sys/block/sde/device/timeout > echo 180 >/sys/block/sdf/device/timeout > echo 180 >/sys/block/sdg/device/timeout > echo 180 >/sys/block/sdh/device/timeout > echo 180 >/sys/block/sdi/device/timeout > echo 180 >/sys/block/sdj/device/timeout > > Doing this is what is meant by changing the driver's timeout, correct? Yes. > Should I be setting this for an even longer period of time? No. > Thank you for helping me to understand what is going on! Are you already doing weekly scrubs and drive self-tests? Do you still have the complete dmesg from the original triple failure? > Mike VanHorn > Senior Computer Systems Administrator > College of Engineering and Computer Science > Wright State University > 265 Russ Engineering Center > 937-775-5157 > michael.vanhorn@xxxxxxxxxx > http://www.cecs.wright.edu/~mvanhorn/ Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html