Hi everyone,
Sorry for the long email ahead of time, but I've tried researching this
the best I can and I've run out of ideas.
I have a nice, new server mobo. It has a Intel 3420 chipset. I have two
1TB Seagate 7200.12 drives attached. I used the Intel OROM to setup the
RAID1 on the two drives. I installed Fedora 11 with no issues. I
encrypted the LVM during the install.
A few days ago one of the drives reported a bad sector. The bad drive
was sdb. I wanted to remove the bad drive so that I could slip in the
new drive when it arrived. Here's the first steps I took:
First I attempted to shutdown and unplug the bad drive. Fedora wouldn't
boot -- gets to Password: prompt for encrypted partition. Correct
password is entered but the encrypted partition cannot be mounted. I
narrowed it down to that the /dev/sda1 and /dev/sda2 partitions are not
showing up so the kernel can't find the correct UUID from /etc/crypttab
to mount /. It previously used /dev/dm-1 I think, but this link has
vanished.
Second, I plugged the bad drive back in. Fedora boots normally.
Third, I get brave and remove the bad drive from the RAID using the
Intel OROM. I then turn off the machine and remove the bad drive. Fedora
won't boot -- gets past password prompt, but during bootup it cannot
find my /boot partition and dumps me to a recovery shell.
Fourthly, I plug the bad drive back in and Fedora boots. The bad drive
is no longer in the RAID, but its old partitions are exposed. The UUID
of /dev/sdb1 matches and so it mounts /boot.
Fifthly, I couldn't get the bad drive to rebuild so I used a Fedora Live
USB install to start the rebuild. Rebooted and Fedora wouldn't boot. No
/dev/sda1 partition and no /dev/sdb1 or /dev/sdb2 partitions!
Eventually I solved this by removing the "--rm_partitions" part in
/etc/rc.sysinit. This allowed Fedora to find /dev/sda1 and boot. It
seems somehow the device mapper mappings broke when the RAID broke. I
don't know how to fix this.
I RMA'd the bad drive and now I have a replacement drive. I installed
the new drive and I told the Intel OROM to include it in my RAID1
volume. I booted into Fedora. During boot, I get mdadm messages that it
is adding a drive. I haven't seen this before. When Fedora is loaded,
dmraid wants to claim my RAID status is 'ok' yet no rebuilding is
happening. I have no HDD activity. I attempt to initiate a rebuild. I
get an error: ERROR: Unable to suspend device. Google searching for this
exact string returns *zero* results. In order to even see any HDD
activity I booted into my LiveUSB stick and ran "dmraid -R raidset" and
it took 3 hours for the HDD light to turn off. I reboot and Intel OROM
still says "Rebuild". Argh!!
In the middle of all this I ran "dmraid -n" and I see *three* hard
drives in my RAID. One is the original sda drive, the second is the new
sdb drive, and the third is the serial of the sdb drive, but with a :0
at the end. I don't see any way of removing that drive. How did it even
get there?
Now that I've shared my life story, I'm down to these questions:
- How do I properly rebuild my RAID1? The Intel OROM still says "rebuild"
- Does dmraid not support live rebuilding? That seems silly that I had
to use a LiveUSB load to rebuild.
- Does dmraid not support rebuild status? I had no idea if the rebuild
was occurring besides the HDD light.
- How do I fix device mapper so I don't have to remove the
"--rm_partitions" out of /etc/rc.sysinit?
- How do I get my RAID metadata looking good? (no extra ghost drives)
Thanks,
Michael
P.S. It seems the easiest way is to just nuke the array and start over
but I want to know why this is so hard... it seems dmraid is rather
"experiemental" and Fedora is moving to mdadm anyway.
_______________________________________________
Ataraid-list mailing list
Ataraid-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ataraid-list