On Wed, Dec 14, 2022 at 09:11:39AM +1100, Dave Chinner wrote: > On Tue, Dec 13, 2022 at 12:50:28PM -0800, Eric Biggers wrote: > > On Tue, Dec 13, 2022 at 06:29:24PM +0100, Andrey Albershteyn wrote: > > > Not yet implemented: > > > - No pre-fetching of Merkle tree pages in the > > > read_merkle_tree_page() > > > > This would be helpful, but not essential. > > > > > - No marking of already verified Merkle tree pages (each read, the > > > whole tree is verified). > > Ah, I wasn't aware that this was missing. > > > > > This is essential to have, IMO. > > > > You *could* do what btrfs does, where it caches the Merkle tree pages in the > > inode's page cache past i_size, even though btrfs stores the Merkle tree > > separately from the file data on-disk. > > > > However, I'd guess that the other XFS developers would have an adversion to that > > approach, even though it would not affect the on-disk storage. > > Yup, on an architectural level it just seems wrong to cache secure > verification metadata in the same user accessible address space as > the data it verifies. > > > The alternatives would be to create a separate in-memory-only inode for the > > cache, or to build a custom cache with its own shrinker. > > The merkel tree blocks are cached in the XFS buffer cache. > > Andrey, could we just add a new flag to the xfs_buf->b_flags to > indicate that the buffer contains verified merkle tree records? > i.e. if it's not set after we've read the buffer, we need to verify > the buffer and set th verified buffer in cache and we can skip the > verification? Well, my proposal at https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@xxxxxxxxxx is to keep tracking the "verified" status at the individual Merkle tree block level, by adding a bitmap fsverity_info::hash_block_verified. That is part of the fs/verity/ infrastructure, and all filesystems would be able to use it. However, since it's necessary to re-verify blocks that have been evicted and then re-instantiated, my patch also repurposes PG_checked as an indicator for whether the Merkle tree pages are newly instantiated. For a "non-page-cache cache", that part would need to be replaced with something equivalent. A different aproach would be to make it so that every time a page (or "cache buffer", to call it something more generic) of N Merkle tree blocks is read, then all N of those blocks are verified immediately. Then there would be no need to track the "verified" status of individual blocks. My concerns with that approach are: * Most data reads only need a single Merkle tree block at the deepest level. If at least N tree blocks were verified any time that any were verified at all, that would make the worst-case read latency worse. * It's possible that the parents of N tree blocks are split across a cache buffer. Thus, while N blocks can't have more than N parents, and in practice would just have 1-2, those 2 parents could be split into two separate cache buffers, with a total length of 2*N. Verifying all of those would really increase the worst-case latency as well. So I'm thinking that tracking the "verified" status of tree blocks individually is the right way to go. But I'd appreciate any other thoughts on this. - Eric