On 2011-04-08, at 1:27 PM, Mingming Cao wrote: > On Wed, 2011-04-06 at 15:44 -0700, Darrick J. Wong wrote: >> Hi all, >> >> I spent last week analyzing a client's corrupted ext3 image to see if I could >> determine what had gone wrong and caused the filesystem to blow apart. As best >> as I could tell, a data block got miswritten into a different sector ... which >> happened to be an indirect block. Some time later the indirect block, which >> now pointed at one of the inode tables (among other things that shouldn't ever >> become file data) was loaded as part of a file write, which caused that inode >> table to be blown to smithereens. Just for fun I tried reading from one of >> these busted-inode files and ... failed to encounter any errors. Somehow, they >> didn't find it funny that ext3 would read block numbers from a table with the >> contents "ibm.com" with a straight face. Fortunately there were backups. :) >> >> The client at this point asked if ext4 would do a better job of sanity >> checking, which got me to wonder why ext4 checksums block groups but not >> inodes. It's on Ted's todo list, but apparently nobody wrote any patch, so I >> did. The following two patches are a first draft of adding inode checksum >> support to both the kernel driver and to the various e2fsprogs. >> > > We had some discussion about this week at SF (at the ext4 bof at the > linux colloboration summit). Beyond checksumming the inode itself, it > would be more useful to checksum the extent indexing blocks, as the ext3 > corruption actually happen at the indirect block. > > The idea is to reduce the eh_max (the max # of extents stored in this > block) to save some space to store the checksums in the block, > > /* > * Each block (leaves and indexes), even inode-stored has header. > */ > struct ext4_extent_header { > __le16 eh_magic; /* probably will support different > formats */ > __le16 eh_entries; /* number of valid entries */ > __le16 eh_max; /* capacity of store in entries */ > __le16 eh_depth; /* has tree real underlying blocks? */ > __le32 eh_generation; /* generation of the tree */ > }; > This would make us a RO feature to checksum the leaves and indexes > blocks too. I proposed this quite a long time ago on ext2-devel "topics for the file system mini-summit" and "extents in e2fsprogs", June 2006), called "ext3_extent_tail", and in fact there is some rudimentary allowance for the extent tail in ext2fs_extent_header_verify() so that it doesn't complain if eh_max is 1 or 2 less than the actual maximum number of extents that could fit into the block. The proposed structure from the old emails looked like: struct ext4_extent_tail { /* optional, if eh_max allows it, and flagged */ __le64 et_inum; __le32 et_igeneration; __le32 et_checksum; } Whether we really need et_inum to be a 64-bit value is subject to debate at this point, but due to the index/extent fields being 12 bytes in size there is always going to be 16 bytes available to hold something. We could put a magic perhaps, that is high enough never to conflict with an inode number if we ever get there? Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html