Mark Hennessy wrote: > > I'm using Centos 4.5 right now, and I had a RAID 5 array stop because > two drives became unavailable. After adjusting the cables on several > occasions and shutting down and restarting, I was able to see the > drives again. This is when I snatched defeat from the jaws of > victory. Please, someone with vast knowledge of how RAID 5 with mdadm > works, tell me if I have any chance at all that this array will pull > through with most or all of my data. It may be possible... > Background info about the machine > /dev/md0 is a RAID1 consisting of /dev/sda1 and /dev/sda2 > /dev/md1 is a RAID1 consisting of /dev/sda2 and /dev/sdb2 > /dev/md2 (our special friend) is a RAID5 consisting of /dev/sd[c-j] > > /dev/sdi and /dev/sdj were the drives that detached from the array and > were marked as faulty. > > I did the following things that in hindsight were probably VERY BAD > > Step 1 (Misassign drives to wrong array): > I could probably have had things going again in a tenth of a second if > I hadn't typed this: > mdadm --manage --add /dev/md0 /dev/sdi > mdadm --manage --add /dev/md0 /dev/sdi > > This clobbered the superblock and replaced it with that of /dev/md0, yes? > well, that's what mdadm --misc --examine /dev/sdi and sdj said anyhow. Hmm, not good, but we will mark this drive 'sdi' as bad. > Ok, so what next? > Step 2 (rebuild the array but make sure the params are right!): > I wipe out the superblocks on all of the drives in the array and > rebuild with --assume-clean > for i in c d e f g h i j ; do mdadm --zero-superblock /dev/sd$i ; done > mdadm --create /dev/md2 --assume-clean --level=5 --raid-devices=8 /dev/ > sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj Nooo, you need to make sure sdi is marked as 'bad' offline, you are going to need to assemble the array degraded, then add sdi as a replacement and let it rebuild sdi off the parity. > ok, now it says that the array is recovering and will take about 10 > hours to rebulid. > /dev/sd[c-i] say that they are "active sync" and > /dev/sdj says it's a > spare that's rebuilding. > But now I scroll back in my history and see that oops, the chunk size > is WRONG. Not only that, but I don't stop the array until the rebuild > is at around 8% Well, now I think it's all messed up. > Ok, I stop the array and rebuild with > mdadm --create /dev/md2 --assume-clean --level=5 --chunk --raid- > devices=8 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/ > sdi /dev/sdj > > Now it says it's going to take another 10 hours to rebuild. It's truly hosed now. > How likely are my data irretrievable/gone and at what step would it > have happened if so? I hope you have backups cause your going to need them. If only you posted to the list BEFORE you tried to recover it without knowing what to do. -Ross ______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof. _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos