Re: [PATCH] mkinitrd rescue mode

Ville Herva <vherva@xxxxxxxxxx> · Wed, 17 Aug 2005 16:24:04 +0300

Sorry for coming in the discussion this late. I would just like to give one
user perspective opinion. Feel free to ignore.

First, I think this is a valuable idea.                                         

Example: just recently I had problem problem booting an RHEL3 installation      
with root on LVM, see:                                                          

http://groups-beta.google.com/group/linux.kernel/browse_thread/thread/45c538b12f6e3c51/2988b9d6415a9396?lnk=st&q=lvm2+vherva&rnum=1&hl=en#2988b9d6415a9396     

After 2.6.10-ac8 -> 2.6.12.5 upgrade, the initramfs init script no longer
detected the lvm partitions before attempting to mount rootfs. A simple
"sleep 5" in the init script between lvm scan and mount "solved" the
problem. Why? No idea. This is with the new mkinird package from Fedora.

(Why 2.6.x on RHEL3? Unsolved data corruption bugs with the RHEL3 2.4.x
kernel and driver support. Long story. Mostly 2.6 works great on RHEL3.)

I know this is a bad example, because you will dismiss it ("you're an idiot
to run 2.6 on RHEL3 - run stock kernel -> no problem), but this is not the
only case where this can happen. An LVM config change, kernel upgrade
changes hw detection, etc etc.

Anyway, when booting I only got the "mount: error 6 mounting ext3" error. No
very helpful. I added debug to the initramfs initscript, but the lvm
messages didn't tell much, appeared randomly on screen out of sync, and
scrolled out of screen.

I didn't have a serial console at hand, and netconsole doesn't work at that
phase (would be nice - I did try with statically compiled netconsole and
grub command line option, but had no luck - that would probably work if
configured correctly, though. For stock installation where netconsole is a
module, it might make sense to add the module in the rescue initrd if it is
ever implemented.)

The rescue cd option doesn't help, since it's the kernel I'm booting that is
at failing.

In such case a rescue mode would be *very* useful.

Peter Jones <pjones redhat com> wrote:                                          
> I don't buy that at all.  If, as you state below, we're seeing lots of
> cases where machines don't boot after upgrades, then there's a bug in
> something.  Period.

Sure this is a bug somewhere. Even if it will be fixed eventually
(doubtful), it doesn't help me much when I need to get the server up. To me,
it makes sense to be prepared for risks, even if they "shouldn't" happen.

My point is that these boot problems *do* happen. This is by far not the
only such problem I've seen - LVM for example seems to produce them fairly
often. Eventually I get them solved, but it can take hours. I don't call RH
about them (if there's a bug I report it, of course), and even if I did I
still probably need to get some decent debug information. RH support is not
exactly psychic either.

A rescue mode with some tools would save a *whole* lot of valuable time in
these cases.

I hope the "add-in" rescue init cpio idea can be considered, even though it
may add some complexity.

just my two cents,

-- v --

v@xxxxxx

-- 
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/fedora-devel-list