--- On Mon, 10/13/08, Michael H. Warfield <mhw@xxxxxxxxxxxx> wrote: > From: Michael H. Warfield <mhw@xxxxxxxxxxxx> > Subject: Re: Rawhide does not boot since 2.6.27-0.398 > To: olivares14031@xxxxxxxxx, "For testers of Fedora Core development releases" <fedora-test-list@xxxxxxxxxx> > Cc: mhw@xxxxxxxxxxxx > Date: Monday, October 13, 2008, 7:56 AM > On Sat, 2008-10-11 at 12:48 -0700, Antonio Olivares wrote: > > --- On Sat, 10/11/08, Bruno GARDIN > <bgardin@xxxxxxxxx> wrote: > > > > From: Bruno GARDIN <bgardin@xxxxxxxxx> > > > Subject: Rawhide does not boot since 2.6.27-0.398 > > > To: "For testers of Fedora Core development > releases" <fedora-test-list@xxxxxxxxxx> > > > Date: Saturday, October 11, 2008, 10:55 AM > > > I am testing rawhide for a few month now but i > have problem > > > of boot > > > since kernel 2.6.27-0.398. My rawhide is a > virtual system > > > on vmware > > > server now in version 2.0. Whenever i try to > boot, i got > > > the following > > > errors at the end : > > > Activating logical volumes > > > VOlume group "VolGroup00" not found > > > Unable to access resume device > (/dev/VolGroup00/LogVol01) > > > Creating root device > > > Mounting root file system > > > mount: error mounting /dev/root on /sysroot as > ext3. No > > > such file or directory > > > Setting up other filesystems > > > setuproot: moving /dev failed:No such file or > directory > > > setuproot: error mounting/proc: No such file or > directory > > > setuproot: error mounting /sys: No such file or > directory > > > Mount failed for selinuxfs on /selinux: No such > file or > > > directory > > > Switching to new root and running init > > > swithroot: mount failed: No such file or > directory > > > Booting has failed > > > > Boot works fine with kernel 2.6.27-0.382 but > fails also > > > with 2.6.27-1. > > > I have looked at the thread related to ext4 but i > am using > > > ext3. I > > > have also tried a new mkinitrd on 2.6.27-1 but no > change. > > > Any idea of > > > what the problem could be ? > > The real source of the problem was much earlier in the > messages than > what was originally provided. I've been trying to > track this down > myself. Here is the critical bit of information: > > Kernel that boots: > > Loading dm-mirror module > scsi 2:0:0:0: Direct-Access > scsi target2:0:0: Beginning Domain Validation > scsi target2:0:0: Domain Validation skipping write testes > scsi target2:0:0: Ending Domain Validation: 1204k > scsi target2:0:0 FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, > offset 127) > Loading dm-zero module > sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 > MB) > sd 2:0:0:0: [sda] Write Protect is off > sd 2:0:0:0: [sda] Cache data unavailable > sd 2:0:0:0: [sda] Assuming drive cache: write through > Loading dm-snapshot module > sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 > MB) > sd 2:0:0:0: [sda] Write Protect is off > sd 2:0:0:0: [sda] Cache data unavailable > sd 2:0:0:0: [sda] Assuming drive cache: write through > sda: sda1 sda2 > sd 2:0:0:0: [sda] Attached SCSI disk > sd 2:0:0:0: Attached scsi generic sg1 type 0 > Making device-mapper control node > Scanning logical volumes > Reading all physical volumes. This make take a while... > Found volume group "VolGroup00" using metadata > type lvm2 > > Kernel that fails: > > Scanning logical volumes > scsi 2:0:0:0: Direct-Access > scsi target2:0:0: Beginning Domain Validation > scsi target2:0:0: Domain Validation skipping write testes > scsi target2:0:0: Ending Domain Validation: 1204k > scsi target2:0:0 FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, > offset 127) > Reading all physical volumes. This make take a while... > sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 > MB) > sd 2:0:0:0: [sda] Write Protect is off > sd 2:0:0:0: [sda] Cache data unavailable > sd 2:0:0:0: [sda] Assuming drive cache: write through > sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 > MB) > sd 2:0:0:0: [sda] Write Protect is off > sd 2:0:0:0: [sda] Cache data unavailable > sd 2:0:0:0: [sda] Assuming drive cache: write through > sda: sda1 sda2 > sd 2:0:0:0: [sda] Attached SCSI disk > sd 2:0:0:0: Attached scsi generic sg1 type 0 > Activating logical volumes > Volume group "VolGroup00" not found > Unable to access resume device (/dev/VolGroup00/LogVol01) > > Note that in the kernel that's failing LVM is starting > to scan for > logical volumes before the SCSI devices have stabilized! > It doesn't > find any PV's and so it doesn't find > "VolGroup00". That's the killer. > Now. Look closely at the one that booted above. > What's interspersed > with the scsi startup messages? Loading dm-mirror, loading > dm-zero, > loading dm-snapshot. Then we see the "Scanning > logical volumes". Well, > guess what. Those modules are not present in the latest > kernel as > modules. They created enough of a time delay that lvm > started after the > scsi drivers had settled. Now they are not there, we have > a race > condition and, if lvm starts to early, lvm can find the > drives. The > "Scanning logical volumes" is starting exactly > where we see the "Loading > dm-mirror" in the kernel which does boot. I would > call that a smoking > gun. > > What's really interesting (to me) is if you look at > the 2.6.26 kernels > under F9 (and I'm testing this 2.6.27 kernel under F9 > as well as F10). > You find ALL of the "Loading dm-*" messages AFTER > the scsi drivers have > settled. Something has changed here in the 2.6.27 kernels > where the > scsi drivers either are not settling as fast (debugging > messages and > checks perhaps) or the insmod is returning sooner (before > the drivers > have settled) creating this race condition which did not > exist at all in > 2.6.26. > > I think this is a bug in mkinitrd and it's not > emitting a wait when > it's needed in this case. Down in mkinitrd around line > 1483 is a check > for conditions under which it wants to issue a wait for the > scsi to > setting. I think that either needs to be made > unconditional or at least > expanded to include other scsi devices like the VMware > ones. I cheated. > Up at line 1411 I changed > "wait_for_scsi="no"" to > "wait_for_scsi="yes"" > and rebuild the initrd. Problem goes away and lvm starts > AFTER the scsi > devices have settled. Another way to do this might be to > force the > including of usb-storage (mkinitrd --with-usb) which has > it's own delay > both in loading and settling. > > > > -- > > > BeGe > > > > > > -- > > > Try uninstalling the kernel and reinstalling it via > yum. > > I tried several times and succeeded :) > > No you didn't. You got lucky and won the race that > time. That's the > problem with race conditions. Sometimes you win through > just plain dumb > luck. You may well find it fails on a subsequent reboot > and you're > screwed. Then again, you may well have changed something > else that > changes the timing and changes the probability and it then > works for > you. > > I'll be filing a bugzilla bug on this later if nobody > else gets to it > first. > > > Regards, > > > Antonio > > Mike > -- > Michael H. Warfield (AI4NB) | (770) 985-6132 | > mhw@xxxxxxxxxxxx > /\/\|=mhw=|\/\/ | (678) > 463-0932 | http://www.wittsend.com/mhw/ > NIC whois: MHW9 | An optimist believes we live > in the best of all > PGP Key: 0xDF1DD471 | possible worlds. A pessimist > is sure of it! Michael, You have a very valid point. My problem was that the partition(not the boot partition) was ext4 and had ext4dev, and it was changed to plainly ext4. I had to change /etc/fstab from ext4dev to ext4 then reinstalled the kernel and it worked :) But you are correct with the scsi/dm Volume group(lvms) Regards, Antonio -- fedora-test-list mailing list fedora-test-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-test-list