Hi, On Mon, 2006-05-29 at 10:09 -0600, Andreas Dilger wrote: > This is one thing that we have been thinking of for ext3. Instead of a > filesystem-wide "error" bit we could move this per-group to only mark > the block or inode bitmaps in error if they have a checksum failure. > This would prevent allocations from that group to avoid further potential > corruption of the filesystem metadata. Trouble is, individual files can span multiple groups easily. And one of the common failure modes is failure in the indirect tree. What action do you take if you detect that? There is fundamentally a large difference between the class of errors that can arise due to EIO --- simple loss of a block of data --- and those which can arise from actual corrupt data/metadata. If we detect the latter and attempt to soldier on regardless, then we have no idea what inconsistencies we are allowing to be propagated through the filesystem. That can easily end up corrupting files far from the actual error. Say an indirect block is corrupted; we delete that file, and end up freeing a block belonging to some other file on a distant block group. Ooops. Once that other block gets reallocated and overwritten, we have corrupted that other file. *That* is why taking the fs down/readonly on failure is the safe option. The inclusion of checksums would certainly allow us to harden things. In the above scenario, failure of the checksum test would allow us to discard corrupt indirect blocks before we could allow any harm to come to other disk blocks. But that only works for cases where the checksum notices the problem; if we're talking about possible OS bugs, memory corruption etc. then it is quite possible to get corruption in the in- memory copy, which gets properly checksummed and written to disk, so you can't rely on that catching all cases. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html