On Nov 16, 2014, at 8:39 AM, Justin Stephenson <justin@xxxxxxxxxxxxxxxxx> wrote: > Hello, > > I am new to MDADM and have just experienced my first device fail on my raid 6. > > I am wondering if someone might be able to help by outlining a proper protocol for troubleshooting and rebuilding this array (proc/mdstat below). > > Here is how I might approach it: > > - remove the device > - test the device > - if the device tests OK then re add the device > - if the device fails, then replace the device > - resync > > Thank-you for your consideration. > > Best, > > - Justin > > Here is the mdstat email > > ----------------- > > This is an automatically generated mail message from mdadm > running on BigBlue > > A Fail event had been detected on md device /dev/md0. > > It could be related to component device /dev/sdh1. First step is getting the backup current. Second you can do this without removing the device: # smartctl -x /dev/sdh And then look in dmesg for errors related to its ata designation. You should be able to get a serial number from the smartctl output and can search that with dmesg | grep <serial#> to find out what it’s ata designation (port and device number) is, then you can dmesg | grep ataX.YY to get any read/write error events that explain what’s going on. While you’re at it the following would be helpful as well: # smartctl -l scterc /dev/sdh # cat /sys/block/sdh/device/state # cat /sys/block/sdh/device/timeout These are read-only commands to determine states, they don’t change states so it’s safe. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html