Re: ext3-Partition lost after crash !?

"Theodore Ts'o" <tytso@mit.edu> · Thu, 5 Dec 2002 08:22:25 -0500

On Thu, Dec 05, 2002 at 08:02:49AM +0100, Stephan Wiehr wrote:
> On Wed, Dec 04, 2002 at 10:00:52AM -0500, Theodore Ts'o wrote:
> > > Even clearing the has_journal and needs_recovery flags produced the same
> > > output using fsck as above.
> > 
> > The exact same messages?  Including an error about reading the journal
> > superblock?  Are you sure about this?  That doesn't make any sense at
> > all....
> 
> I was confused as well since I thought this would bring me back to ext2 in
> some way.

Ah, OK, I see what's going on.  The problem is that the journal_inum
field is non-zero, e2fsck is tries load the journal before it tries to
reconcile the fact that the feature flags say that there is no
journal, even though the superblock's journal_inum field points to
one.  This is normally not a problem, since if the user tells e2fsck
to blow away the journal, the loaded journal is discarded, and if the
user tells e2fsck to fix the feature flags, things proceed normally.  

However, if the journal load fails, because the journal inode is
corrupted, e2fsck doesn't do the right thing.  OK, that's an e2fsck
bug, and I can fix it easily enough just be reordering a few lines of
code.

The workaround is relatively simple; use debugfs to clear the
journal_inum field, via the command "set_super_value journal_inum 0",
and then e2fsck will stop blowing out due to the bad journal inode.

However, before you do this, it might be prudent to see how much
damage was done to the inode table.  As Andreas Dilger pointed out,
apparently every other byte in at least in the part of the journal
inode table containing the journal inode has 0xFF.  That does not bode
well, and was almost certainly caused by a hardware failure of some sort.

It might be worth examining some other inode numbers to see how
extensive is the damage.  Each inode is 128 bytes long, and IDE disk
sectors are 512 bytes, so if you're really lucky, only 4 consecutive
inodes will be damaged.  However, it's much more likely that at least
a filesystem block's worth (4096 bytes, or 32 inodes) were lost, and
if you're really unlucky, it may be a lot more than that.

Also worth considering before you do anything is the cause of the
corruption.  It could have been caused by the controller or the IDE
disk going temporarily insane, in which case hopefully it won't be
repeated, but if it is repeatable, doing an image backup will probably
be a good idea.

Another possibility is that if you had a power failure, one of the
things which might have happened is that the memory went insane as the
+5 voltage rail dropped down to zero, and but the DMA engine and disk
drive were able to keep going long enough to have garbage on the disk
drive.  What happened prior to the filesystem crash?  Did you have a
power failure, or did someone hit the Big Red Switch by accident?

(Note: normally, the fact that unlike jfs and reiserfs, ext3 uses
physical block journalling, helps to protect against this situation,
since disk blocks which were being actively written at the time of a
power failure are extremely likely to be in the journal as well, so
when the journal is replayed, the damage is undone.  This doesn't help
though when the part of the inode table containing the location of the
journal is smashed, so that the system can no longer find the
journal.... hence my comment about possibly storing the location of
the journal in a redundant location as an additional safety measure.)

> Before I do anything like writing to the fs I'd just like to check I'm doing
> things right, so here is what I did so far:
> The partition that REALLY crashed is /dev/hdb1 which is 2 GB. Moving some
> data freed /dev/hdb2 (2,5 GB) for 'backup' so I did a 
> 'dd if=/dev/hdb1 of=/dev/hdb2 bs=1024 conv=sync' (BTW: Does the bs of dd has
> something to do with the blocksize of the fs - which is 4096 - don't know
> about this)
> So /dev/hdb1 is still 'virgin' concerning the error state (I hope!) and all
> experimental stuff I did on /dev/hdb2 (like e2salvage or trying to mount it
> as ext2). Still having the originally crashed partition do I need the
> Imagefile of e2image or could I skip this since diskspace has now become rare
> on that machine.

Ah, good.  I see you've already done the backup.  OK, first of all, at
this point, I won't need the e2image.  I'm pretty sure I understand
why e2fsck acted the way it did, and I know what I need to do to make
e2fsck more robust in the future.

In answer to your question about the dd blocksize, no, the blocksize
used by dd doesn't have to be the same as the blocksize used by the
filesystem.  Dd's blocksize determines the size that it reads chunks
in when doing its I/O.  Using a smaller blocksize will slow down the
blocksize slightly, but in the case where there is disk block error,
you may recover more data, since it will retry on a smaller
granularity.  Of course, it will only retry on errors, if the dd
command line has the conv option "conv=noerror,sync".  Without the
"noerror" declaration, dd will abort if a disk i/o error is reflected
up into userspace.  So if the dd command reported any errors, you
didn't get a full copy of the filesystem image, and so you may want to
retry the disk copy before trying to recover the filesystem.

Once you're sure you're working on a clean copy of the filesystem, use
debugfs -w to clear the journal flags, and to clear the journal inode
number, and then try e2fsck.  That will hopefully recover the
filesystem into a consistent state, but let me warn you not to set
your expectations too high.  Between not being able to replay the
journal, and part of the inode table getting smashed (so among other
things, the root directory is gone), you will almost certainly have a
lot of directories ending up in the lost+found directory.  So you'll
probably be able to recover some of your data, but don't be too
surprised if some number of files end up being lost.

Good luck!!

							- Ted

_______________________________________________

Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users