Hi Jayson,
Thanks for all the detailed information yesterday. I've done some more
digging into my system, and I wonder if you'd be willing to comment on
what I found, and the recovery procedure I'm considering.
Quick summary of situation:
- machine comes up, but LVM builds / on top of /dev/sdb3 instead of
/dev/md2 of which /dev/sdb3 is a part
- looks like md2 isn't starting, so I need to fix it (presumably
offline, using a LiveCD), then reboot and get LVM to use the mirror device
What's confusing is that the raid isn't starting at boot time, but
depending on which tools I use shows different status. So first, I have
to get the raid working again and make sure it has the up-to-date data.
Here are some more details, broken into four sections: RAID, LVM, boot
process, recovery procedure - the RAID section has a summary at the
front, followed by details of command listings, the other sections are
much shorter :-):
Comments on the recovery procedure, please!
---------- re. the RAID array --------
RE. the raid array:
summary:
- /proc/mdstat thinks the array is inactive, containing sdb3 and sdd3
- mdadm thinks it's active, degraded, also containing sdb3 and sdd3
(mdadm -D /dev/md2)
- looking at superblocks, mdadm seems to think it's active, degraded
(mdadm -E /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3)
-- containing sda3, only (mdadm -E /dev/sda3)
-- containing sda3, with sdb3 spare (mdadm -E /dev/sdb3)
-- containing sda3 and sdb3, with sdc3 spare (mdadm -E /dev/sdc3) - with
the same Magic #, different UUID from above
-- no superblock on /dev/sdd3 (mdadm -E /dev/sdd3)
details:
more /proc/mdstat:
md2 : inactive sdd3[0] sdb3[2]
195318016 blocks
<looking at RAID>
mdadm -D /dev/md2:
/dev/md2:
Version : 00.90.01
Creation Time : Thu Jul 20 06:15:18 2006
Raid Level : raid1
Device Size : 97659008 (93.13 GiB 100.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Fri Apr 3 10:06:41 2009
State : active, degraded
Active Devices : 0
Working Devices : 2
Failed Devices : 0
Spare Devices : 2
Number Major Minor RaidDevice State
0 8 51 0 spare rebuilding /dev/sdd3
1 0 0 - removed
2 8 19 - spare /dev/sdb3
<looking at component devices>
server1:/etc/lvm# mdadm -E /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
/dev/sda3:
Magic : a92b4efc
Version : 00.90.00
UUID : 3a32acee:8a132ab9:545792a8:0df49d99
Creation Time : Thu Jul 20 06:15:18 2006
Raid Level : raid1
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Update Time : Fri Apr 3 22:40:39 2009
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Checksum : 71d21f34 - correct
Events : 0.114704240
Number Major Minor RaidDevice State
this 0 8 3 0 active sync /dev/sda3
0 0 8 3 0 active sync /dev/sda3
1 1 0 0 1 faulty removed
/dev/sdb3:
Magic : a92b4efc
Version : 00.90.00
UUID : 3a32acee:8a132ab9:545792a8:0df49d99
Creation Time : Thu Jul 20 06:15:18 2006
Raid Level : raid1
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Update Time : Fri Apr 3 10:06:41 2009
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
Checksum : 71d1d1fa - correct
Events : 0.114716950
Number Major Minor RaidDevice State
this 2 8 19 2 spare /dev/sdb3
0 0 8 3 0 active sync /dev/sda3
1 1 0 0 1 faulty removed
2 2 8 19 2 spare /dev/sdb3
/dev/sdc3:
Magic : a92b4efc
Version : 00.90.00
UUID : 635fb32e:6a83a5be:12735af4:74016e66
Creation Time : Wed Jul 2 12:48:36 2008
Raid Level : raid1
Raid Devices : 2
Total Devices : 3
Preferred Minor : 2
Update Time : Fri Apr 3 06:42:50 2009
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 0
Spare Devices : 1
Checksum : 95973481 - correct
Events : 0.26
Number Major Minor RaidDevice State
this 2 8 35 2 spare /dev/sdc3
0 0 8 3 0 active sync /dev/sda3
1 1 8 19 1 active sync /dev/sdb3
2 2 8 35 2 spare /dev/sdc3
mdadm: No super block found on /dev/sdd3 (Expected magic a92b4efc, got
00000000)
<looking at devices with --scan>
server1:/etc/lvm# mdadm -E --scan /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3
ARRAY /dev/md2 level=raid1 num-devices=2
UUID=635fb32e:6a83a5be:12735af4:74016e66
devices=/dev/sdc3
ARRAY /dev/md2 level=raid1 num-devices=2
UUID=3a32acee:8a132ab9:545792a8:0df49d99
devices=/dev/sda3,/dev/sdb3
-------- re. LVM ---------
/etc/lvm.conf contains the line:
md_component_detection = 0
I expect that if I set it to 1 that would tell LVM to look for RAIDs first.
Also, /etc/lvm/backup/rootvolume contains:
pv0 {
id = "2ppSS2-q0kO-3t0t-uf8t-6S19-qY3y-pWBOxF"
device = "/dev/md2" # Hint only
which suggests that if the RAID is running, lvm will do the right thing
---------- re. boot process ------------
looks like detailed events are:
- MBR loads grub
- grub knows about md and lvm, mounts read-only
-- kernel /vmlinuz-2.6.8-3-686 root=/dev/mapper/rootvolume-rootlv
ro mem=4
- during main boot md comes up first, then lvm
-- from rcS.d/S25mdadm-raid: if not already running ... mdadm -A -s -a
---- I'm guessing this fails for /dev/md2
-- from rcS.d/S26lvm:
-- creates lvm device
-- creates dm device
-- does a vgscan
---- which is where this happens:
Found duplicate PV 2ppSS2q0kO3t0tuf8t6S19qY3ypWBOxF: using /dev/sdb3
not /dev/sda3
Found volume group "backupvolume" using metadata type lvm2
Found volume group "rootvolume" using metadata type lvm2
-- does a vgchange -a -y
---- which looks like it's picking up on sdb3
-- I'm guessing that if the mirror were active, and based on /dev/sdb3
- lvm would pick that up as the volume group
** is this where setting md_component_detection = 1 would be helpful?
------------ recovery procedure ------------
here's what I'm thinking of doing - comments please!
1. turn logging on in lvm.conf, reboot, examine logs to confirm above
guesses (or find out what's really happening)
-- based on the logging, maybe set md_component_detection = 1 in lvm.conf
2. shutdown, boot from LiveCD (I'm using systemrescuecd - great tool by
the way)
3. backup /dev/sdb3 using partimage (just in case!)
4. try to fix /dev/md2
if it's not running - start it, with only /dev/sdb3; then add in other
devices
- A /dev/md2 --add /dev/sdb3 --run (**is this the right way to do
this?**)
- add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a
/dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2
if it's running:
- fail all except /dev/sdb3 (mdadm -f /dev/sda3; mdadm -f /dev/sdb3;
mdadm -f /dev/sdd3)
- remove all except /dev/sdb3 (mdadm -r /dev/sda3; mdadm -r /dev/sdb3;
mdadm -r /dev/sdd3)
- add each device back (mdadm -a /dev/sda3; mdadm -a /dev/sdb3; mdadm -a
/dev/sdd3)
- grow to 3 active devices: mdadm --grow -n 3 /dev/md2
question: do I need to update mdadm.conf?
question: do I need to anything to get rid of the superblock containing
a different UUID
5. reboot the system
- it may just come up
- if it comes up and lvm is still operating off a single partition,
repeat the above, but first add a filter to lvm.conf (wash, rinse,
repeat as necessary)
*** does this seem like a reasonable game plan? ***
Thanks again for your help!
Miles
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/