Re: another seriously corrupt ext3 -- pesky journal

Erez Zadok <ezk@xxxxxxxxxxxxx> · Sun, 24 Aug 2003 22:21:42 -0400

In message <20030824195744.GA5164@think>, "Theodore Ts'o" writes:
> On Sat, Aug 23, 2003 at 09:23:29AM -0600, Andreas Dilger wrote:
> > I was a bit worried about it myself, but I think re-replaying the journal
> > will be safe, and the journal itself can detect if there is corruption
> > (except in the rare case of corruption between the transaction start and
> > commit blocks, but we would hit that regardless).
> 
> If we were really paranoid about that case we could create a new
> descriptor block format which included CRC-32 of each data block in
> the journal.  I'm not really sure it's worth it though, since it's not
> clear what we would do if the checksum was invalid.  Abort the journal
> replay entirely?  Only replay blocks that have a valid checksum?  The
> latter might work, but the filesystem would almost certainly be in a
> pretty unhappy state afterwards.

Yes, checksumming alone can only tell you when some piece of data lost its
integrity, but not what the good value was.  For that, you need ECCs or some
form of redundancy (such as RAID, backup/duplicate journal blocks, etc.).

I know you are concerned about further slowing down ext3, but I think these
are issues best left to individual sites to determine -- that is, how much
performance overheard are they willing to take for better
reliability/integrity.  Those are some of the same concerns that make someone
choose raid0/1 vs. raid5, or when to turn on various forms of ext3
journaling.

Personally, I'd like to see additional options available in ext3 that will
provide better recovery chances from the kinds of f/s corruptions I've had
recently; it'd be up to me which of these options to turn on/off, and I'd
have to accept that some options may consume more cpu/storage resources than
others.

Some of the options I'd like to see have to do with more redundancy:
duplicate journal inode, duplicate journal data, etc.  If we take this step
further, we're going to get a versioning f/s.  Has there been any discussion
on this list about versioning some of the types of data in ext3 (or is that
such a huge change, it's better left for the ext4-developers list. :-)

BTW, I'm working on a stackable versioning f/s with built-in policies to
allow users to determine the tradeoffs b/t resource consumption vs. amount
of versioning used.

Erez.

_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users