On Tue, 2006-02-07 at 17:34 -0500, Peter Jones wrote: > On Tue, 2006-02-07 at 00:17 -0700, Dax Kelson wrote: > > On Mon, 2006-02-06 at 21:02 -0500, Peter Jones wrote: > > > On Mon, 2006-02-06 at 13:08 -0700, Dax Kelson wrote: > > > > The standard root=LABEL=/ was used on the kernel command line and what > > > > happened is that it booted up to one side of the mirror. All the updates > > > > and new packages (including a new kernel install which modified the > > > > grub.conf) activity just happened on that one side of the mirror. > > Are you sure about this? Your blkid.tab looks very much like you used > the default layout on Jan 13... My para was regarding "event one" which got blown away and reinstalled over. The blkid.tab I posted was from the next install I did which used the auto-layout feature of disk druid. > > > This should be fixed in the current rawhide tree. > > > > And now it uses root=/dev/mapper/$DEV ? > > No, it still uses root=LABEL=/ (assuming no lvm), but the label > searching mechanism early in the boot process is now the same as that > used by mount, umount, swapon, etc., and it currently gives > device-mapper devices a higher "priority", which should guarantee that, > assuming it's possible to build the raid, all of those tools will use > the dm device instead of the normal disks. Good to know. > So your blkid.tab says: > > > <device DEVNO="0xfd01" TIME="1139069826" PRI="40" > TYPE="swap">/dev/dm-1</device> > > <device DEVNO="0xfd05" TIME="1137182541" PRI="40" TYPE="swap">/dev/dm-5</device> > > <device DEVNO="0xfd02" TIME="1137182541" PRI="40" TYPE="ntfs">/dev/dm-2</device> > > <device DEVNO="0xfd04" TIME="1137182541" PRI="40" UUID="faffb8d3-2562-4489-a1f8-a7e0077e1e6c" SEC_TYPE="ext2" TYPE="ext3">/dev/dm-4</device> > > <device DEVNO="0x0801" TIME="1137182541" TYPE="ntfs">/dev/sda1</device> > > <device DEVNO="0x0802" TIME="1139162151" LABEL="/boot" UUID="f49b0225-bdd4-430a-a3b0-f0f7c20daaff" SEC_TYPE="ext2" TYPE="ext3">/dev/sda2</device> > > <device DEVNO="0x0811" TIME="1137182541" TYPE="ntfs">/dev/sdb1</device> > > <device DEVNO="0x0812" TIME="1137182541" LABEL="/boot" UUID="f49b0225-bdd4-430a-a3b0-f0f7c20daaff" SEC_TYPE="ext2" TYPE="ext3">/dev/sdb2</device> > > <device DEVNO="0x0813" TIME="1137182541" TYPE="swap">/dev/sdb3</device> > > <device DEVNO="0xfd03" TIME="1137182541" TYPE="swap">/dev/dm-3</device> > > <device DEVNO="0xfd01" TIME="1139162137" TYPE="swap">/dev/VolGroup00/LogVol01</device> > > OK, archeology time. On Jan 13, 2006 at about 8pm GMT you installed > with the disk layout something like: > > /dev/sda /dev/sdb -> dm-1 (which would not have gotten an entry > in blkid.tab) > /dev/sda1 /dev/sdb1 -> dm-2 ntfs (PRI=40, wheras sda1 and sdb1 have > PRI=0) > /dev/sda2 + /dev/sdb2 -> VolGroup00 (no device node, thus no entry) > VolGroup00 -> > dm-3 (LogVol01) -> swap > dm-4 (LogVol00) -> / > > (dm-3 vs dm-4 reflects the order they were activated, not necessarily > the order on disk) > > *something happened here, no idea what* > > Sometime around Feb 4, 2006, at 4pm GMT you rebooted, and the raid > didn't get started. This looks like one of your disks wasn't connected > at all, and the other was doing weird things. LVM brought up LogVol01, > but if both disks were there it would have been complaining about > inconsistent VG metadata for VolumeGroup00. For whatever reason, > LogVol00 _didn't_ come back up. /boot may or may not have been mounted, > we can't say. Physically the disks and their cables haven't been touched. I remember a big yum update that segfaulted half way through. Maybe related? > 25 hours later you walked back into the room and power cycled the box. > Then about 26 hours later you rebooted again. This time for some reason > the /boot record on /dev/sda2 was modified. This may indicate that sda2 > was missing the previous time we booted far enough to get / mounted rw. > Once again VolGroup00/LogVol01 was activated correctly, but / was not. > > The last 2 lines have no PRI= section, that's weird, and might mean my > leaf-node test in libblkid is broken. That shouldn't cause the other > failures we've seen, though. > > >From what you say below I'm assuming something went wrong making your > initrd on the 4th. > > > GRUB always sees the "activated" RAID because of the BIOS RAID driver. > > When it reads the "grub.conf" it is interleaving pieces of the two (now > > different) grub.conf files and the result most likely has bogus syntax > > and content. > > Well, yes and no. It sees a disk as 0x80, and when it does int 13h, the > bios decides which disk it's going to send that to. How it decides is > anybody's guess; I'm sure it varies wildly between bioses. > > > Jan 14th 2006 rawhide for event one, and jan 14th 2006 initial install > > with yum updates every couple days for event two. > > Looks like the 13th, but either should be sufficient. > > > > > On bootup I noticed an error flash by something to the effect of "LVM > > > > ignoring duplicate PV". > > This is the inconsistent metadata error I mentioned above, FWIW. > > > I booted to the rescue environment with a Jan 14th boot.iso and NFS > > tree. The rescue environment properly activated the dmraid and > > "pvdisplay" showed "/dev/mapper/nvidia-foo" > > > > I looked inside the two initrd files I had: > > > > 2.6.15-1.1884 = dm commands inside "init" > > OK, so that should work assuming you didn't move the disks around, etc. > (I'm working on making moving the disks around ok, but it's a bit > complicated so it might take a while) > > > 2.6.15-1.1889 - no dm commands inside "init" -- dated Feb 4th on my box > > OK, so if you boot this you're going to get /dev/sda* accessed. Any > idea what versions of e2fsprogs, lvm2, util-linux, device-mapper, and > mkinitrd were installed? (I'll understand if you don't...) I can look and see tonight when I get home. My guess right now is whatever was in rawhide at that time. > So that means when you installed that, you were either already booted > without using raid, or mkinitrd (or one of the many tools it uses) was > broken. Yes. I'm sure I didn't closely observe every bootup sequence or was even in the room while the boot occurred so it would be easy for me to have missed something. > > > One interesting note is that given any of these you should be getting > > > the same disk mounted each time. Which means there's a good chance that > > > sda and sdb are both fine, one of them just happens to represent your > > > machine 3 weeks ago. > > > > It installed OK on Jan 14th, and has been successfully booting and using > > the dmraid until (I think) Feb 4th. > > Looks like there was at least one problem before that, or mkinitrd > couldn't find the raid devices when you updated that day. > > > > Do you still have this disk set, or have you wiped it and reinstalled > > > already? If you've got it, I'd like to see /etc/blkid.tab from either > > > disk (both if possible). > > > > Since the / filesystem is in a LVM LV sitting ontop of a dmraid > > partition PV, it seems non-trivial to force the PV for the LV to change > > back and both to access the separate files. If you know a way, let me > > know. > > export the metadata, use vim to rename the volume group, reimport the > metadata. I can't recall the commands off the top of my head right > now... > > alternately you can add this to the "device" subsection > of /etc/lvm/lvm.conf : > > filter = [r|sda|] > > and it'll no longer look at anything with "sda" in the name. I can attempt this if you still want. Dax Kelson Guru Labs -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list