On Sat, 28 April 2007 17:05:22 -0500, Matt Mackall wrote: > > This is a relatively simple scheme for making a filesystem with > incremental online consistency checks of both data and metadata. > Overhead can be well under 1% disk space and CPU overhead may also be > very small, while greatly improving filesystem integrity. I like it a lot. Until now it appears to solve more problems and cause fewer new problems than ChunkFS. > [...] > > Divide disk into a bunch of tiles. For each tile, allocate a one > block tile header that contains (inode, checksum) pairs for each > block in the tile. Unused blocks get marked inode -1, filesystem > metadata blocks -2. The first element contains a last-clean > timestamp, a clean flag and a checksum for the block itself. For 4K > blocks with 32-bit inode and CRC, that's 512 blocks per tile (2MB), > with ~.2% overhead. You should add a 64bit fpos field. That allows you to easily check for addressing errors. > [Note that CRCs are optional so we can cut the overhead in half. I > choose CRCs here because they're capable of catching the vast > majority of accidental corruptions at a small cost and mostly serve > to protect against errors not caught by on-disk ECC (eg cable noise, > kernel bugs, cosmic rays). Replacing CRCs with a stronger hash like > SHA-n is perfectly doable.] The checksum cannot protect against a maliciously prepared medium anyway, so crypto makes little sense. Crc32 can provably (if you trust those who did the proof) detect all 1, 2 and 3-bit errors and has a 1:2^32 chance of detecting any remaining errors. That is fairly hard to improve on. > Every time we write to a tile, we must mark the tile dirty. To cut > down time to find dirty tiles, the clean bits can be collected into a > smaller set of blocks, one clean bitmap block per 64GB of data. You can and possibly should organize this as a tree, similar to a file. One bit at the lowest level marks a tile as dirty. One bit for each indirect block pointer marks some tiles behind the pointer as dirty. That scales logarithmically to any filesystem size. Jörn -- I don't understand it. Nobody does. -- Richard P. Feynman - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html