On 3 October 2011 14:58, Marcin M. Jessa <lists@xxxxxxxxx> wrote: > On 10/3/11 3:39 PM, Mathias Burén wrote: > >> I would run badblocks on the md0 device. (increase number of blocks to >> check at a time until you use all your available RAM) >> After that I'd run dd. > > Any particular options you would give to dd ? > >> I would also check the SMART data on all >> drives > > What's strange SMART always says all the drives are healthy. > All of failures started with dmesg saying: > exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen > ata9.00: failed command: FLUSH CACHE EXT > ata9.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 > res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > ata9.00: status: { DRDY } > > That "exception Emask" part pointed me to misc threads where people > mentioned bugs in the Linux kernel. > > A reboot would somehow reset the drives and they would always be working > fine again and I could always resync the array until the next time when a > drive would get kicked off. > >> and the health of the controller. > > How can I run a check on that within Linux? > > > > -- > > Marcin M. Jessa > Can you post the smartctl -a -T permissive (etc) output on a pastebin somewhere, for all HDDs? What controller are you using? /M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html