On Mon, Sep 13, 2021 at 11:44:37AM -0700, Boris Burkov wrote: > +_fsv_scratch_begin_subtest "way too big: fail on first merkle block" > +# have to go back by 4096 from max to not hit the fsverity MAX_LEVELS check. > +truncate -s $(($max_sz - 4095)) $fsv_file > +_fsv_enable $fsv_file |& _filter_scratch This is actually a kernel bug, so please don't work around it in the test :-( It will be fixed by the kernel patch https://lore.kernel.org/linux-fscrypt/20210916203424.113376-1-ebiggers@xxxxxxxxxx > + > +# The goal of this second test is to make a big enough file that we trip the > +# EFBIG codepath, but not so big that we hit it immediately as soon as we try > +# to write a Merkle leaf. Because of the layout of the Merkle tree that > +# fs-verity uses, this is a bit complicated to compute dynamically. > + > +# The layout of the Merkle tree has the leaf nodes last, but writes them first. > +# To get an interesting overflow, we need the start of L0 to be < MAX but the > +# end of the merkle tree (EOM) to be past MAX. Ideally, the start of L0 is only > +# just smaller than MAX, so that we don't have to write many blocks to blow up, > +# but we take some liberties with adding alignments rather than computing them > +# correctly, so we under-estimate the perfectly sized file. > + > +# We make the following assumptions to arrive at a Merkle tree layout: > +# The Merkle tree is stored past EOF aligned to 64k. > +# 4K blocks and pages > +# Merkle tree levels aligned to the block (not pictured) > +# SHA-256 hashes (32 bytes; 128 hashes per block/page) > +# 64 bit max file size (and thus 8 levels) > + > +# 0 EOF round-to-64k L7L6L5 L4 L3 L2 L1 L0 MAX EOM > +# |-------------------------| ||-|--|---|----|-----|------|--|!!!!!| > + > +# Given this structure, we can compute the size of the file that yields the > +# desired properties. (NB the diagram skips the block alignment of each level) > +# sz + 64k + sz/128^8 + 4k + sz/128^7 + 4k + ... + sz/128^2 + 4k < MAX > +# sz + 64k + 7(4k) + sz/128^8 + sz/128^7 + ... + sz/128^2 < MAX > +# sz + 92k + sz/128^2 < MAX > +# (128^8)sz + (128^8)92k + sz + (128)sz + (128^2)sz + ... + (128^6)sz < (128^8)MAX > +# sz(128^8 + 128^6 + 128^5 + 128^4 + 128^3 + 128^2 + 128 + 1) < (128^8)(MAX - 92k) > +# sz < (128^8/(128^8 + (128^6 + ... + 128 + 1)))(MAX - 92k) > +# > +# Do the actual caclulation with 'bc' and 20 digits of precision. > +# set -f prevents the * from being expanded into the files in the cwd. > +set -f > +calc="scale=20; ($max_sz - 94208) * ((128^8) / (1 + 128 + 128^2 + 128^3 + 128^4 + 128^5 + 128^6 + 128^8))" > +sz=$(echo $calc | $BC -q | cut -d. -f1) > +set +f It's hard to follow the above explanation. I'm still wondering whether it could be simplified a lot. Maybe something like the following: # The goal of this second test is to make a big enough file that we trip the # EFBIG codepath, but not so big that we hit it immediately when writing the # first Merkle leaf. # # The Merkle tree is stored with the leaf node level (L0) last, but it is # written first. To get an interesting overflow, we need the maximum file size # (MAX) to be in the middle of L0 -- ideally near the beginning of L0 so that we # don't have to write many blocks before getting an error. # # With SHA-256 and 4K blocks, there are 128 hashes per block. Thus, ignoring # padding, L0 is 1/128 of the file size while the other levels in total are # 1/128**2 + 1/128**3 + 1/128**4 + ... = 1/16256 of the file size. So still # ignoring padding, for L0 start exactly at MAX, the file size must be s such # that s + s/16256 = MAX, i.e. s = MAX * (16256/16257). Then to get a file size # where MAX occurs *near* the start of L0 rather than *at* the start, we can # just subtract an overestimate of the padding: 64K after the file contents, # then 4K per level, where the consideration of 8 levels is sufficient. sz=$(echo "scale=20; $max_sz * (16256/16257) - 65536 - 4096*8" | $BC -q | cut -d. -f1) That gives a size only 4103 bytes different from your calculation, and IMO is much easier to understand. - Eric