On Sun, June 14, 2009 5:10 pm, linux-raid.vger.kernel.org@xxxxxxxxxxx wrote: > So here I was thinking everything was fine. My six disks were working > for hours and the other two disks were loaded as spares and the first > one was rebuilding, up to 30% with an ETA of 5 hours. I left the house > for a few hours and when I came back, the same disk with read errors > before had spontaneously disconnected and reconnected three times (I > saw in dmesg). It probably got around 80% of the way through the six > hour rebuild. > > The problem is that when the /dev/sdc disk reconnected itself after, > it was marked as a "Spare", and now I can't use the same command any > longer: This doesn't make a lot of sense. It should not have been marked as a spare unless someone explicitly tried to "Add" it to the array. I've been thinking that I need to improve mdadm in this respect and make it harder to accidentally turn a failed drive into a spare. However you description of event suggests that this was automatic which is strange. Can I get the complete kernel logs from when the rebuild started to when you finally gave up? It might help me understand. > > # mdadm --assemble /dev/md13 --verbose --force /dev/sd{a,b,c,d,e,f}1 > > This time it doesn't work, as it says 5 disks and 1 spare isn't enough > to start the array. I also tried --re-add, but it already thinks it > is disk 9 out of 8, a Spare. > > How can I safely put this disk back into its proper place so I can > again try to rebuild disks 7 and 8? I'm assuming I probably need to > use mdadm --create, but I'm not sure, and don't want to get it wrong > and have it overwrite this needed disk. Yes, I suspect that you need --create, but I cannot be certain with out seeing all the details (e.g. --examine of all devices). When using --create you need to ensure that the drives are in the right order with "missing" at the right places. As long as there are two missing devices no resync will happen so the data will not be changed. So after doing a --create you can fsck and mount etc and ensure the data is safe before continuing. But if you cannot get though a sequential read of all devices without any read error, you wont be able to rebuild redundancy. (There are plans to make raid6 more robust in this scenario, but they are a long way from fruition yet). NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html