A solution to Kernel Panic ... on ext3 only!

tytso@mit.edu (Theodore Ts'o) · Wed, 22 May 2002 12:32:38 -0400

On Mon, May 20, 2002 at 04:57:19PM +0800, Uwe Dippel wrote:
> In any case, I did the fsck as prescribed, but abandoned the effort
> after having to type the 'Y' for more than 100 times; restarting fsck
> with '-y'. *Never* ever do this on ext3, as you will see later!
> It started but then informed me about " ... too many errors" or so.

There is no "too many errors" printed by e2fsck.  The closest is "too
many illegal blocks" in an inode, which is followed by an offer to
clear the inode entirely.  If an inode has a corrupted indirect block,
this can cause e2fsck to offer to give up on the inode altogether.
This normal behaviour, and shouldn't have caused any problem (unless
the system really needed the inode which had been corrupted in order
for its boot scripts, of course).

> Next, after reboot, it came with a kernel-panic: No init found.
> This is almost the time for re-install, isn't it!? No, I tried the
> repair before. Bad luck, while reading my nice root-partition (hda6), it
> complained "Error mounting filesystem on hda6: Invalid argument", and
> "You don't have any Linux partitions. Press return ..."; though I could
> *see* all files on hda6 nicely at the shell. I checked fstab: okay. I
> could even mount /dev/hda6 to /mnt/help; looking pretty sane. Though I
> was loosing out on my sanity ... !

What distribution are you using (i.e., RedHat, Mandrake, Debian,
etc)., and do you know how the root partition was trying to be
mounted?  Based on what you reported, it sounds like the boot scripts
were only trying to mount it as ext3, and not as anything else.
Normally the initrd scripts are set up to load the ext3 module, if
necessary, and then attempts a mount of the filesystem without
specifying the filesystem type.  That way, the kernel will
automatically attempt to mount the filesystem first as ext3, and then
if that fails, it will fall back and attempt a mount as ext2.

> The "fsck -y" (see above) had not been able to handle all the errors and
> made the journal unusable. This is why all rescue and booting ended in
> disarray: the corrupted journal made the partition look invalid as ext3,
> though it was not so bad. I only had to convert it to ext2, have it
> repair all the errors and finally recreate the journal; effectively
> reconvert it to ext3. It is been running ever since without problem.
> I am even pondering to consider that behaviour a bug, since a somewhat
> minor problem made things worse (kernel panic!) unnecessarily. It seems
> the journal got corrupted not by the outage but by simply too many
> automated 'Y' *during repair* !? At least, it had been okay for the
> first boot after the outage. Remarkable!

There are some filesystem corruptions that will cause e2fsck to offer
to delete the journal, after which it will explicitly state that the
journal inode has been removed, and that the filesystem has been
recoverted to ext2.  It doesn't take a lot of filesystem errors, and
this will happen with or without "fsck -y" (that's just a
red-herring).  Certain specific filesystem corruptions simply corrupt
the journal file, and cause e2fsck to need to remove it.  

It's possible that there might have been some subtle filesystem
corruption which where e2fsck doesn't clear out the journal inode and
recoverts the filesystem to ext2, but you haven't given us enough
information to know exactly what the filesystem corruption was.  (See
the man page for e2fsck, in the REPORTING BUGS section for a
discussion of the sort of information that is really needed to for me
to be able to reproduce, find, and fix e2fsck bugs.)

As I said, "fsck -y" is a red-herring.  All -y does is the equivalent
of your typing 'y' to every question asked by e2fsck; it just saves a
little keyboard wear and tear.  The real question is how your
filesystem was corrupted, and how e2fsck reacted to that particular
form of filesystem corruption.  E2fsck *should* be able to handle any
just about any form of filesystem corruption, for ext2 or ext3, and it
surely shouldn't make things worse.  But of course, no software is
bug-free, and you may have found some new and innovative way for your
filesystems to be corrupted, which e2fsck doesn't deal with correctly.
But I would need a lot more information to be able to figure out
exactly what happened.

						- Ted