Antonio, very clear explanation indeed. For me the bug 466071 has been set on this problem already. I am going to test your proposal of workaround. Michael, you can forget my preceding message as i didn't read the explanation of ANtonio before. Regards. On Mon, Oct 13, 2008 at 5:42 PM, Antonio Olivares <olivares14031@xxxxxxxxx> wrote: > --- On Mon, 10/13/08, Michael H. Warfield <mhw@xxxxxxxxxxxx> wrote: > >> From: Michael H. Warfield <mhw@xxxxxxxxxxxx> >> Subject: Re: Rawhide does not boot since 2.6.27-0.398 >> To: olivares14031@xxxxxxxxx, "For testers of Fedora Core development releases" <fedora-test-list@xxxxxxxxxx> >> Cc: mhw@xxxxxxxxxxxx >> Date: Monday, October 13, 2008, 7:56 AM >> On Sat, 2008-10-11 at 12:48 -0700, Antonio Olivares wrote: >> > --- On Sat, 10/11/08, Bruno GARDIN >> <bgardin@xxxxxxxxx> wrote: >> >> > > From: Bruno GARDIN <bgardin@xxxxxxxxx> >> > > Subject: Rawhide does not boot since 2.6.27-0.398 >> > > To: "For testers of Fedora Core development >> releases" <fedora-test-list@xxxxxxxxxx> >> > > Date: Saturday, October 11, 2008, 10:55 AM >> > > I am testing rawhide for a few month now but i >> have problem >> > > of boot >> > > since kernel 2.6.27-0.398. My rawhide is a >> virtual system >> > > on vmware >> > > server now in version 2.0. Whenever i try to >> boot, i got >> > > the following >> > > errors at the end : >> > > Activating logical volumes >> > > VOlume group "VolGroup00" not found >> > > Unable to access resume device >> (/dev/VolGroup00/LogVol01) >> > > Creating root device >> > > Mounting root file system >> > > mount: error mounting /dev/root on /sysroot as >> ext3. No >> > > such file or directory >> > > Setting up other filesystems >> > > setuproot: moving /dev failed:No such file or >> directory >> > > setuproot: error mounting/proc: No such file or >> directory >> > > setuproot: error mounting /sys: No such file or >> directory >> > > Mount failed for selinuxfs on /selinux: No such >> file or >> > > directory >> > > Switching to new root and running init >> > > swithroot: mount failed: No such file or >> directory >> > > Booting has failed >> >> > > Boot works fine with kernel 2.6.27-0.382 but >> fails also >> > > with 2.6.27-1. >> > > I have looked at the thread related to ext4 but i >> am using >> > > ext3. I >> > > have also tried a new mkinitrd on 2.6.27-1 but no >> change. >> > > Any idea of >> > > what the problem could be ? >> >> The real source of the problem was much earlier in the >> messages than >> what was originally provided. I've been trying to >> track this down >> myself. Here is the critical bit of information: >> >> Kernel that boots: >> >> Loading dm-mirror module >> scsi 2:0:0:0: Direct-Access >> scsi target2:0:0: Beginning Domain Validation >> scsi target2:0:0: Domain Validation skipping write testes >> scsi target2:0:0: Ending Domain Validation: 1204k >> scsi target2:0:0 FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, >> offset 127) >> Loading dm-zero module >> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 >> MB) >> sd 2:0:0:0: [sda] Write Protect is off >> sd 2:0:0:0: [sda] Cache data unavailable >> sd 2:0:0:0: [sda] Assuming drive cache: write through >> Loading dm-snapshot module >> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 >> MB) >> sd 2:0:0:0: [sda] Write Protect is off >> sd 2:0:0:0: [sda] Cache data unavailable >> sd 2:0:0:0: [sda] Assuming drive cache: write through >> sda: sda1 sda2 >> sd 2:0:0:0: [sda] Attached SCSI disk >> sd 2:0:0:0: Attached scsi generic sg1 type 0 >> Making device-mapper control node >> Scanning logical volumes >> Reading all physical volumes. This make take a while... >> Found volume group "VolGroup00" using metadata >> type lvm2 >> >> Kernel that fails: >> >> Scanning logical volumes >> scsi 2:0:0:0: Direct-Access >> scsi target2:0:0: Beginning Domain Validation >> scsi target2:0:0: Domain Validation skipping write testes >> scsi target2:0:0: Ending Domain Validation: 1204k >> scsi target2:0:0 FAST-40 WIDE SCSI 80.0 MB/s ST (25 ns, >> offset 127) >> Reading all physical volumes. This make take a while... >> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 >> MB) >> sd 2:0:0:0: [sda] Write Protect is off >> sd 2:0:0:0: [sda] Cache data unavailable >> sd 2:0:0:0: [sda] Assuming drive cache: write through >> sd 2:0:0:0: [sda] 25165824 512-byte hardware sectors (12885 >> MB) >> sd 2:0:0:0: [sda] Write Protect is off >> sd 2:0:0:0: [sda] Cache data unavailable >> sd 2:0:0:0: [sda] Assuming drive cache: write through >> sda: sda1 sda2 >> sd 2:0:0:0: [sda] Attached SCSI disk >> sd 2:0:0:0: Attached scsi generic sg1 type 0 >> Activating logical volumes >> Volume group "VolGroup00" not found >> Unable to access resume device (/dev/VolGroup00/LogVol01) >> >> Note that in the kernel that's failing LVM is starting >> to scan for >> logical volumes before the SCSI devices have stabilized! >> It doesn't >> find any PV's and so it doesn't find >> "VolGroup00". That's the killer. >> Now. Look closely at the one that booted above. >> What's interspersed >> with the scsi startup messages? Loading dm-mirror, loading >> dm-zero, >> loading dm-snapshot. Then we see the "Scanning >> logical volumes". Well, >> guess what. Those modules are not present in the latest >> kernel as >> modules. They created enough of a time delay that lvm >> started after the >> scsi drivers had settled. Now they are not there, we have >> a race >> condition and, if lvm starts to early, lvm can find the >> drives. The >> "Scanning logical volumes" is starting exactly >> where we see the "Loading >> dm-mirror" in the kernel which does boot. I would >> call that a smoking >> gun. >> >> What's really interesting (to me) is if you look at >> the 2.6.26 kernels >> under F9 (and I'm testing this 2.6.27 kernel under F9 >> as well as F10). >> You find ALL of the "Loading dm-*" messages AFTER >> the scsi drivers have >> settled. Something has changed here in the 2.6.27 kernels >> where the >> scsi drivers either are not settling as fast (debugging >> messages and >> checks perhaps) or the insmod is returning sooner (before >> the drivers >> have settled) creating this race condition which did not >> exist at all in >> 2.6.26. >> >> I think this is a bug in mkinitrd and it's not >> emitting a wait when >> it's needed in this case. Down in mkinitrd around line >> 1483 is a check >> for conditions under which it wants to issue a wait for the >> scsi to >> setting. I think that either needs to be made >> unconditional or at least >> expanded to include other scsi devices like the VMware >> ones. I cheated. >> Up at line 1411 I changed >> "wait_for_scsi="no"" to >> "wait_for_scsi="yes"" >> and rebuild the initrd. Problem goes away and lvm starts >> AFTER the scsi >> devices have settled. Another way to do this might be to >> force the >> including of usb-storage (mkinitrd --with-usb) which has >> it's own delay >> both in loading and settling. >> >> > > -- >> > > BeGe >> > > >> > > -- >> >> > Try uninstalling the kernel and reinstalling it via >> yum. >> > I tried several times and succeeded :) >> >> No you didn't. You got lucky and won the race that >> time. That's the >> problem with race conditions. Sometimes you win through >> just plain dumb >> luck. You may well find it fails on a subsequent reboot >> and you're >> screwed. Then again, you may well have changed something >> else that >> changes the timing and changes the probability and it then >> works for >> you. >> >> I'll be filing a bugzilla bug on this later if nobody >> else gets to it >> first. >> >> > Regards, >> >> > Antonio >> >> Mike >> -- >> Michael H. Warfield (AI4NB) | (770) 985-6132 | >> mhw@xxxxxxxxxxxx >> /\/\|=mhw=|\/\/ | (678) >> 463-0932 | http://www.wittsend.com/mhw/ >> NIC whois: MHW9 | An optimist believes we live >> in the best of all >> PGP Key: 0xDF1DD471 | possible worlds. A pessimist >> is sure of it! > > Michael, > > You have a very valid point. My problem was that the partition(not the boot partition) was ext4 and had ext4dev, and it was changed to plainly ext4. I had to change /etc/fstab from ext4dev to ext4 then reinstalled the kernel and it worked :) > > But you are correct with the scsi/dm Volume group(lvms) > > Regards, > > Antonio > > > > > -- > fedora-test-list mailing list > fedora-test-list@xxxxxxxxxx > To unsubscribe: > https://www.redhat.com/mailman/listinfo/fedora-test-list > -- BeGe -- fedora-test-list mailing list fedora-test-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-test-list