On Thursday 24 October 2013 14:44:14 Mikael Abrahamsson wrote: > On Thu, 24 Oct 2013, yuji_touya@xxxxxxxxxxxxxxxxxxxx wrote: > > > Here's syslog entries about raid10 and smartctl output. > > sdb seems to have too many bad blocks. Is that the reason why sdb was kicked out? > > Most likely. > > > I'm going to copy files from /dev/md0 to anywhere else as soon as possible. > > Should I repair filesystem before copying? (like xfs_repair /dev/md0) > > What you need to do now is to use dd_rescue or equivalent to copy the data > off of sdb to a good drive. Stop the array first. This means you'll lose > data on the bad blocks. After this is done, and you have assembled the > array with the good drive with (most of) the data from sdb, start the > array, then hot-add in sdc and let things sync up. You should now have > redundancy. all! Just had a fight with this myself, also using Seagate drives. And I don't think he needs to loose any data, nor use ddrescue here. Just enabling scterc (which is disabled by default and will be after a power down of the drive), setting the timeout and then running a repair on the array fixed it for me as md was smart enough to try to rewrite the sector(s) that had failed and with scterc the drive would then reallocate the failed sector. I thought I had this done, but a syntax error in the script had prevented it from working.. :-( ) The working script I ran for this was: ============================= # Set up RAID drive timeouts for x in b c d e do smartctl -l scterc,70,70 /dev/sd$x echo 180 >/sys/block/sd$x/device/timeout done ============================== After taht run "echo "repair" >/sys/block/md0/md/sync_action" This should move the 112 count for your "Pending" sectors to "Reallocated_Sector_Ct" in the smartctl output and fix your array. After that again you should readd the drive that has been missing almost since the initialization of the array and keep a close eye on the error counts there. You should also keep an eye on the Reallocated_Sector_Ct for sdb though. Your 112 is still below the health limit for Seagate's (200), but it is fairly high and indicates a "not so good" drive. If the count goes over 200 Seagate will replace the drive. If someone with more insight has objections to the procedure above, please tell me. But this worked for me. > Also check why you didn't get notification that sdc wasn't part of the > array, usually mdmon or equivalent will send email about these events. Good advice! Set up the smartctl email address! Best Dag -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html