Patrick Hoover wrote:
However, in this case, where SMART doesn't appear to work, what are the best options for monitoring disk integrity / degradation?
echo check > /sys/block/mdX/md/sync_action AFAIK, using the above, MD will "repair" bad blocks and kick disks if it fails. A failed repair means the spare area is full and the disk desperately needs a replacement. Having a spare disk in the array is probably a good idea too, to minimize downtime, especially if you're not able to get to the machine all the time. I don't know if MD checks spares for read errors, though that would definitely be useful. Monitoring for kicked disks can be done by running mdadm in daemon mode with --monitor, having it send you an email when such an event occurs. Something that I've found useful is to "dd if=/dev/hdX of=/dev/null bs=1M count=512". If there is a problem in the IDE driver, or the cable is loose, or there is a controller problem, the disk might respond, but will fail as soon as you read a large amount of data. I've also seen disks succeed to read some amount of data, but at a significantly lower rate than it should. Monitoring the read rates of each disk can be helpful (at least it is to me) to diagnose such problems. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html