On Wed, Apr 24, 2024 at 01:23:48PM -0700, Darrick J. Wong wrote: > > > How about "the hash of an i_blocksize-sized buffer of zeroes" for all > > > three? > > > > It's the Merkle tree block size, not the filesystem block size. Or did you > > actually intend for this to use the filesystem block size? > > I actually did intend for this to be the fs block size, not the merkle > tree block size. It's the bottom level that I care about shrinking. > Let's say that data[0-B] are the data blocks: > > root > +-internal0 > | +-leaf0 > | | +-data0 > | | +-data1 > | | `-data2 > | `-leaf1 > | +-data3 > | +-data4 > | `-data5 > `-internal1 > +-leaf2 > | +-data6 > | +-data7 > | `-data8 > `-leaf3 > +-data9 > +-dataA > `-dataB > > (thanks to https://arthursonzogni.com/Diagon/#Tree ) > > If data[3-5] are completely zeroes (unwritten blocks, sparse holes, > etc.) then I want to skip writing leaf1 of the merkle tree to disk. > > If it happens that the hashes of leaf[0-1] match hash(data3) then it's > frosting on top (as it were) that we can also skip internal0. However, > the merkle tree has a high fanout factor (4096/32==128 in the common > case), so I care /much/ less about eliding those levels. > > > In struct merkle_tree_params, the "block size" is always the Merkle tree block > > size, so the type of block size seems clear in that context. My complaint was > > just that it used the term "data block" to mean a block that is not necessarily > > a file contents block (which is what "data block" means elsewhere). > > Hm. Given the confusion, would it help if I said that zero_digest > should only be used to elide leaf nodes of the merkle tree that hash the > contents of file content blocks? Or is "the hash of an > i_blocksize-sized buffer of zeroes" sufficient? > > What do you think of the commit message saying: > > "Compute the hash of one filesystem block's worth of zeroes. Any merkle > tree leaf block containing only this hash can be elided at write time, > and its contents synthesized at read time. > > "Let's pretend that there's a file containing six data blocks and whose > merkle tree looks roughly like this: > > root > +--leaf0 > | +--data0 > | +--data1 > | `--data2 > `--leaf1 > +--data3 > +--data4 > `--data5 > > "If data[0-2] are sparse holes, then leaf0 will contain a repeating > sequence of @zero_digest. Therefore, leaf0 need not be written to disk > because its contents can be synthesized." It sounds like you're assuming that the file data is always hashed in filesystem block sized units. That's not how it works. The block size that's selected for fsverity (which is a power of 2 between 1024 and min(fs_block_size, PAGE_SIZE), inclusively) is used for both the data blocks and the Merkle tree blocks. This is intentional, so that people can e.g. calculate the fsverity digest of a file using a 4K block size, and deploy the file to both filesystems that use a 4K filesystem block size and filesystems that use a 16K filesystem block size, and get the same fsverity file digest each time. I've considered offering the ability to configure the data block size separately from the Merkle tree block size, like what dm-verity does. This hasn't seemed useful, though. And in any case, it should not be tied to the FS block size. A better way to think about things might be that the Merkle tree actually *includes* the data, as opposed to being separate from it. In this respect, it's natural that the Merkle tree parameters including block size, hash algorithm, and salt apply both to blocks that contain file data and to blocks that contain hashes. In general, all the fsverity code and documentation probably needs to be clearer about whether, when referring to the Merkle tree, it means just the hash blocks, or if it means the conceptual full tree that includes the file's data. - Eric