On Thu, Nov 5, 2009 at 7:33 PM, Michael Cronenworth <mike@xxxxxxxxxx> wrote: > First I attempted to shutdown and unplug the bad drive. Fedora wouldn't boot > -- gets to Password: prompt for encrypted partition. Correct password is > entered but the encrypted partition cannot be mounted. I narrowed it down to > that the /dev/sda1 and /dev/sda2 partitions are not showing up so the kernel > can't find the correct UUID from /etc/crypttab to mount /. It previously > used /dev/dm-1 I think, but this link has vanished. [ First let me preface this by saying I am not a dmraid expert so please forgive if I state something incorrectly below ] I believe some versions of dmraid fail to assemble the volume if it is marked degraded. When you booted with the drive missing the option-ROM noticed that the drive was missing and marked-up the metadata to reflect that fact. Normally when dmraid assembles a raid set it removes the partitions of the member devices since the entire drive is managed by DM. It sounds like in this case it failed to assemble the raid set and removed the partitions. I suspect since this is a raid1 that you should be able to boot the single good disk if dmraid is left out of the picture, but this isn't what you want. > Second, I plugged the bad drive back in. Fedora boots normally. The option-ROM saw that you re-added a disk with out-of-date metadata and marked the array as 'Rebuild'. Since the array is not degraded dmraid allows assembly, everything works like before. > Third, I get brave and remove the bad drive from the RAID using the Intel > OROM. I then turn off the machine and remove the bad drive. Fedora won't > boot -- gets past password prompt, but during bootup it cannot find my /boot > partition and dumps me to a recovery shell. Yes, back to a degraded case like your first state. > Fourthly, I plug the bad drive back in and Fedora boots. The bad drive is no > longer in the RAID, but its old partitions are exposed. The UUID of > /dev/sdb1 matches and so it mounts /boot. In this case the act of removing the drive via the option-ROM erased the RAID metadata on it, so dmraid sees nothing to claim and leaves /dev/sdb alone. The partitions get exposed (i.e. not removed by dmraid) allowing the crypto code to mount, but as you have probably guessed this isn't what you want because the raid is now bypassed. > Fifthly, I couldn't get the bad drive to rebuild so I used a Fedora Live USB > install to start the rebuild. Rebooted and Fedora wouldn't boot. No > /dev/sda1 partition and no /dev/sdb1 or /dev/sdb2 partitions! You can't rebuild in this case because your root filesystem is mounted on this drive. dmraid can't claim this drive for its exclusive use and leaves the drive alone. Booting to the Live USB means /dev/sdb is no longer in use. When you say "start the rebuild", do you mean you didn't allow it to finish? I am not sure how the metadata updates are handled in the version of dmraid you are using, maybe it waits to update the metadata until after the rebuild is complete??? > Eventually I solved this by removing the "--rm_partitions" part in > /etc/rc.sysinit. This allowed Fedora to find /dev/sda1 and boot. It seems > somehow the device mapper mappings broke when the RAID broke. I don't know > how to fix this. In this case you are effectively mimicking your "Fourthly" with sda in place of sdb. > I RMA'd the bad drive and now I have a replacement drive. I installed the > new drive and I told the Intel OROM to include it in my RAID1 volume. I > booted into Fedora. During boot, I get mdadm messages that it is adding a > drive. I haven't seen this before. This concerns me I would hope that the Fedora 11 initramfs would disable mdadm when dmraid is being used to activate a partition. To verify this conflict is/isn't happening you would need to get a prompt in the initramfs and run "cat /proc/mdstat" to see what's being claimed. > When Fedora is loaded, dmraid wants to > claim my RAID status is 'ok' yet no rebuilding is happening. I have no HDD > activity. I attempt to initiate a rebuild. I get an error: ERROR: Unable to > suspend device. Google searching for this exact string returns *zero* > results. Perhaps this version of dmraid does not support online rebuid? > In order to even see any HDD activity I booted into my LiveUSB > stick and ran "dmraid -R raidset" and it took 3 hours for the HDD light to > turn off. I reboot and Intel OROM still says "Rebuild". Argh!! Looks like the dmraid -R raidset command is not modifying the metadata after the rebuild completes? > In the middle of all this I ran "dmraid -n" and I see *three* hard drives in > my RAID. One is the original sda drive, the second is the new sdb drive, and > the third is the serial of the sdb drive, but with a :0 at the end. I don't > see any way of removing that drive. How did it even get there? I don't know how/if dmraid modified the metadata, but the option-ROM will retain a ghost disk entry until the array is rebuilt. > Now that I've shared my life story, I'm down to these questions: > - How do I properly rebuild my RAID1? The Intel OROM still says "rebuild" > - Does dmraid not support live rebuilding? That seems silly that I had to > use a LiveUSB load to rebuild. > - Does dmraid not support rebuild status? I had no idea if the rebuild was > occurring besides the HDD light. > - How do I fix device mapper so I don't have to remove the "--rm_partitions" > out of /etc/rc.sysinit? > - How do I get my RAID metadata looking good? (no extra ghost drives) Newer dmraid releases may handle the rebuild case better. However, I suspect you should be able to rebuild it with mdadm via a Live USB/CD image. This should allow you to get the array back into a state that will make the dmraid in your Fedora 11 environment happy. 0/ If you haven't already, get a backup of your one good drive in case something goes wrong with the following steps. 1/ Boot to a Live USB/CD image with a recent version of mdadm (>= 3.0). 2/ Make sure that dmraid has not assembled the disks 3/ mdadm -A /dev/md/imsm /dev/sda # add the one good drive to an 'imsm container' 4/ mdadm -I /dev/md/imsm # start the container 5/ cat /proc/mdstat # verify that your raid volume was started in degraded mode 6/ mdadm --add /dev/md/imsm /dev/sdb # add the new disk to the container which starts the rebuild 7/ <wait for rebuild to complete> 8/ mdadm -E /dev/sda # dump the metadata and check that it is no longer marked 'Degraded'/'Rebuild' 9/ mdadm -Ss # stop the array 10/ Boot back into Fedora 11 and let dmraid assemble the array normally. -- Dan _______________________________________________ Ataraid-list mailing list Ataraid-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ataraid-list