On Fri, Jul 25, 2014 at 05:34:22PM -0700, Darrick J. Wong wrote: > There's a particular problem with e2fsck's user interface where > checksum errors are concerned: Fixing the first complaint about > a checksum problem results in the inode being cleared even if e2fsck > could otherwise have recovered it. While this mode is useful for > cleaning the remaining broken crud off the filesystem, we could at > least default to checking everything /else/ and only complaining about > the incorrect checksum if fsck finds nothing else wrong. > > So, plumb in a config option. We default to "verify and checksum" > unless the user tell us otherwise. I'm not convinced this is the right way to go. Telling the user that they need to muck with the config file depending on what sort of file system corruption they have seems rather unsatisfying. This is what I'd much rather do. Add a "sanity checking" mode to the inode scanning functions which gets enabled when EXT2_SF_SANITY_CHECK is set via ext2fs_inode_scan_flags(). What the sanity check mode does is every time the inode scan functions read in a new inode table block, it performs a "sanity check" on the inode table block. The sanity check is carried out as follows. If a majority of the inodes in the inode table block are "insane" then set the EXT2_SF_INSANE_ITABLE_BLOCK flag in scan flags, if not, clear this flag. If checksum is incorrect, the inode is considered insane. If the extent flag is set, and the extent header looks insane, then the inode is considered insane. For indirect blocks, if more than 50% of the blocks in i_blocks[] are invalid, then inode is considered insane. This is basically a simiplified version of an algorithm which Andreas has been carrying in Lustre's e2fsprogs for a while, which tries to apply a hueristic check over multiple inodes to decide whether if we would be better off just zapping all of the inodes in an inode table block. The reason why I never integrated that change into mainline is that in order to make it work, it violated a large number of abstractions, and so I considered too ugly to live. The advantage of doing this all inside lib/ext2fs/inode.c's inode scanning function is that it's much cleaner. We can't do as many checks as Andreas did, but for the rough hueristic of deciding whether we have a minor problem in a single inode, or a massive problem caused by garbage written into the inode table or another inode table block getting written into the wrong place on disk (which we can only do if metadata checksums are enabled, but that's OK), we can get away with doing only the obvious "local" checks. After all, in practice, it's usually either problems in a single inode (usually caused by a kernel bug or a memory bit flip), or complete garbage written into the inode table block, or an inode table block written to wrong place on disk, on top of another inode table block. So we just need a rough hueristic to distinguish between these cases. Once we've decided whether the entire inode table block is insane or not, then what we do is if an inode has any problems at all during the pass1 scan, we check to see if the inode table block is marked insane. If it is considered insane, then we just clear the i_links_count and set dtime, effectively zapping the inode, no questions asked. Otherwise, we proceed doing the individual fix ups of each inode field. Does that make sense? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html