On 12/17/2013 10:45 PM, Julie Ashworth wrote: > hi Phil, > thanks again for your help. It was surprisingly easy to install the latest smarmontools. > > On 17-12-2013 14.43 -0500, Phil Turmel wrote: >> I was interested in the reallocation counts, the current pending >> sectors, and the scterc timeouts. The latter were not present, and are >> important. > > ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > 5 Reallocated_Sector_Ct PO--CK 100 100 036 - 3 > 197 Current_Pending_Sector -O--C- 100 100 000 - 1 > SCT Error Recovery Control: > Read: 100 (10.0 seconds) > Write: 100 (10.0 seconds) > > (I also attached the full output) > > I verified that a weekly scrub is performed via cron (default with Centos5), and there were no errors detected prior to the sync. The output is included in syslog reports. Very good. You do not have a timeout mismatch problem. But the behavior of /dev/sdb does not match its health. That suggests some other problem is present, like a bad SATA cord or socket, a bad power supply, bad cooling, et cetera. >> But /dev/sdb has three relocations and only one pending error. That's >> an old drive, but not sick. I'd be concerned that there're other >> hardware issues in your system if the timeout issue is not part of the >> problem. > > Should I run the sync (mdadm -a) in verbose mode? If so, what is the best way to terminate the current sync? By failing/removing /dev/sda? I'd let the sync continue until it fails or completes. And if it completes, exercise the array to see if it stays flaky. If it does not complete, start swapping parts in the system. Regards, Phil ps. I'll be offline all day today--I'm sure the list will chip in if you need more help. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html