Re: [RFC PATCH 00/11] fs-verity support for XFS

Eric Biggers <ebiggers@xxxxxxxxxx> · Tue, 13 Dec 2022 22:31:42 -0800

On Wed, Dec 14, 2022 at 09:11:39AM +1100, Dave Chinner wrote:
> On Tue, Dec 13, 2022 at 12:50:28PM -0800, Eric Biggers wrote:
> > On Tue, Dec 13, 2022 at 06:29:24PM +0100, Andrey Albershteyn wrote:
> > > Not yet implemented:
> > > - No pre-fetching of Merkle tree pages in the
> > >   read_merkle_tree_page()
> > 
> > This would be helpful, but not essential.
> > 
> > > - No marking of already verified Merkle tree pages (each read, the
> > >   whole tree is verified).
> 
> Ah, I wasn't aware that this was missing.
> 
> > 
> > This is essential to have, IMO.
> > 
> > You *could* do what btrfs does, where it caches the Merkle tree pages in the
> > inode's page cache past i_size, even though btrfs stores the Merkle tree
> > separately from the file data on-disk.
> >
> > However, I'd guess that the other XFS developers would have an adversion to that
> > approach, even though it would not affect the on-disk storage.
> 
> Yup, on an architectural level it just seems wrong to cache secure
> verification metadata in the same user accessible address space as
> the data it verifies.
> 
> > The alternatives would be to create a separate in-memory-only inode for the
> > cache, or to build a custom cache with its own shrinker.
> 
> The merkel tree blocks are cached in the XFS buffer cache.
> 
> Andrey, could we just add a new flag to the xfs_buf->b_flags to
> indicate that the buffer contains verified merkle tree records?
> i.e. if it's not set after we've read the buffer, we need to verify
> the buffer and set th verified buffer in cache and we can skip the
> verification?

Well, my proposal at
https://lore.kernel.org/r/20221028224539.171818-2-ebiggers@xxxxxxxxxx is to keep
tracking the "verified" status at the individual Merkle tree block level, by
adding a bitmap fsverity_info::hash_block_verified.  That is part of the
fs/verity/ infrastructure, and all filesystems would be able to use it.

However, since it's necessary to re-verify blocks that have been evicted and
then re-instantiated, my patch also repurposes PG_checked as an indicator for
whether the Merkle tree pages are newly instantiated.  For a "non-page-cache
cache", that part would need to be replaced with something equivalent.

A different aproach would be to make it so that every time a page (or "cache
buffer", to call it something more generic) of N Merkle tree blocks is read,
then all N of those blocks are verified immediately.  Then there would be no
need to track the "verified" status of individual blocks.

My concerns with that approach are:

  * Most data reads only need a single Merkle tree block at the deepest level.
    If at least N tree blocks were verified any time that any were verified at
    all, that would make the worst-case read latency worse.

  * It's possible that the parents of N tree blocks are split across a cache
    buffer.  Thus, while N blocks can't have more than N parents, and in
    practice would just have 1-2, those 2 parents could be split into two
    separate cache buffers, with a total length of 2*N.  Verifying all of those
    would really increase the worst-case latency as well.

So I'm thinking that tracking the "verified" status of tree blocks individually
is the right way to go.  But I'd appreciate any other thoughts on this.

- Eric