Re: topics for the file system mini-summit

"Stephen C. Tweedie" <sct@xxxxxxxxxx> · Wed, 07 Jun 2006 11:10:46 +0100

Hi,

On Mon, 2006-05-29 at 10:09 -0600, Andreas Dilger wrote:

> This is one thing that we have been thinking of for ext3.  Instead of a
> filesystem-wide "error" bit we could move this per-group to only mark
> the block or inode bitmaps in error if they have a checksum failure.
> This would prevent allocations from that group to avoid further potential
> corruption of the filesystem metadata.

Trouble is, individual files can span multiple groups easily.  And one
of the common failure modes is failure in the indirect tree.  What
action do you take if you detect that?

There is fundamentally a large difference between the class of errors
that can arise due to EIO --- simple loss of a block of data --- and
those which can arise from actual corrupt data/metadata.  If we detect
the latter and attempt to soldier on regardless, then we have no idea
what inconsistencies we are allowing to be propagated through the
filesystem.  

That can easily end up corrupting files far from the actual error.  Say
an indirect block is corrupted; we delete that file, and end up freeing
a block belonging to some other file on a distant block group.  Ooops.
Once that other block gets reallocated and overwritten, we have
corrupted that other file.

*That* is why taking the fs down/readonly on failure is the safe option.

The inclusion of checksums would certainly allow us to harden things.
In the above scenario, failure of the checksum test would allow us to
discard corrupt indirect blocks before we could allow any harm to come
to other disk blocks.  But that only works for cases where the checksum
notices the problem; if we're talking about possible OS bugs, memory
corruption etc. then it is quite possible to get corruption in the in-
memory copy, which gets properly checksummed and written to disk, so you
can't rely on that catching all cases.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html