Re: another seriously corrupt ext3 -- pesky journal

"Theodore Ts'o" <tytso@xxxxxxx> · Sun, 24 Aug 2003 15:57:44 -0400

On Sat, Aug 23, 2003 at 09:23:29AM -0600, Andreas Dilger wrote:
> I was a bit worried about it myself, but I think re-replaying the journal
> will be safe, and the journal itself can detect if there is corruption
> (except in the rare case of corruption between the transaction start and
> commit blocks, but we would hit that regardless).

If we were really paranoid about that case we could create a new
descriptor block format which included CRC-32 of each data block in
the journal.  I'm not really sure it's worth it though, since it's not
clear what we would do if the checksum was invalid.  Abort the journal
replay entirely?  Only replay blocks that have a valid checksum?  The
latter might work, but the filesystem would almost certainly be in a
pretty unhappy state afterwards.

> Even so, we wouldn't normally be "redoing" the journal replay, we would
> just be giving it the chance to replay.  The journal itself will record
> the last transaction that was flushed to the filesystem, so only newer
> transactions will be replayed as it normally would be.

Yeah, I was concerned we might have skipped that last step when
replaying the journal (and just cleared the needs_recovery flag), but
it appears to be safe to set the needs_recovery flag on a clean ext3
filesystem and do a journal recovery.  The recovery code simply notes
that the journal is empty, and exits.

> We can consider separately if e2fsck should try to "recover" corrupt
> metadata blocks from the journal either by forcing a journal replay of
> all transactions in the journal or by doing the metadata block recovery
> one at a time from the journal on an as-needed basis (probably just
> saving the physical block number in the journal with the logical block
> that it represents when the journal is first loaded).

The problem is after the journal has been replayed, we lose the
starting block of the previous journal.  So it would be a bit of trick
to find valid descriptor blocks and make sure the assocated data
blocks were still valid.  (This would be easier if we had checksums in
the descriptor blocks).

It's an interesting way of trying to get a last-chance saving throw
for bad block recovery, or when it's clear a metadata block is
hopelessly corrupted.  In general, though it would be hard to program
in hueristics where a potentially out-of-date block in the journal
might be a better choice that the corrupt metadata (probably in the
inode table) found in the filesystem.  I could imagine this being
useful for manual human intervention, but I don't know if we could
make e2fsck do anything interesting and safe with it.  It's certainly
something interesting to think about.

						- Ted

_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users