Re: ext3 dead after testing 2.6.0-test5

"Theodore Ts'o" <tytso@xxxxxxx> · Wed, 10 Sep 2003 17:33:35 -0400

On Wed, Sep 10, 2003 at 10:27:15PM +0200, Norbert Preining wrote:
> 
> I think the most important point is that I had to clear the entire inode
> 8 to get as far as fsck would *not* go into an endless (at least very
> long) loop restarting itself. Thus, the journal was destroyed, on next
> mount the fs was mounted as ext2, then I could do fsck which gave really
> loads of error message I know (duplicate blocks, etc, all I have ever
> seen ;-), but at the end it suceeded.
> 

What version of e2fsck are you using?  The newer versions of e2fsck
have more checks that should offer to clear the journal inode (so you
don't have to do this manually via debugfs).  Of course, there might
be some bugs still, but do please try the latest e2fsck first.  (The
latest e2fsck has a lot of bug fixes and improvements, so I strongly
recommend upgrading; see the Release Notes for more information.)

If someone finds a filesystem where e2fsck doesn't offer to clear the
journal, I would be very interested in getting a compressed raw
e2image dump file of the filesystem so I can reproduce it and create a
test case.

Similarly, if you can find a test case case where (IN THE ABSENCE OF
HARDWARE ERRORS) where a single run of e2fsck is not capable of fixing
all of the filesystems, I also want to know about.  (That is, if you
run e2fsck -f from the command line, and it fixes some errors, and
then you run e2fsck with the -f option a second time, it should not
find any further errors.  If it does, by definition there is a bug in
e2fsck, and I want to know about it.)  If you find such a case, at
minimum I would apprecate getting a full transcript of the e2fsck
output, and preferably, a compressed raw e2image dump before the first
e2fsck run.  

Because of the "in the absence of hardware errors" caveat, this is why
it's nice to have a compressed raw e2image dump file is so important.
This way we can uncompress the filesystem metadata on another hard
disk, and try to replicate the problem.  If we can't replicate it,
then it's likely caused by a hardware problem or a device driver
problem, such that two reads from a single block result in different
results, or a read, write, read sequence to a block doesn't result in
reading the same data which was written.  E2fsck fundamentally assumes
that the device driver, disk controller, and disk drive are sane, and
that data written stays written, and data read at one time stays the
same until modify by an intervening write.  If these assumptions are
violated, all guarantees are off.

> Hmm. Then when are these error messages about 
> 	journal aborted
> or something similar from, when I booted 2.6.0-test5, while with test4
> it was working.

If the filesystem code detects a problem, which can be caused by a
filesystem inconsistency on disk, or a hardware error, or a device
driver problem of some sort, then the filesystem throws an error.
What happens at this point depends on how the filesystem is
configured.  The filesystem can be told to ignore the error ("don't
worry, be happy") and just continue on.  The filesystem set so that if
a filesystem inconsistency has been detected, the system can be forced
to panic and reboot.  Finally, the filesystem can be mounted
read-only.  In that case, journal writes are stopped (which is the
source of the journal aborted message), and the filesystem is
remounted read-only.  There are generally other messages before the
journal aborted message, which indicate what is really going on.

						- Ted

_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users