Re: [PATCH 03/13] fsverity: support block-based Merkle tree caching

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Wed, 24 Apr 2024 17:27:06 -0700

On Wed, Apr 24, 2024 at 10:08:58PM +0000, Eric Biggers wrote:
> On Wed, Apr 24, 2024 at 02:25:23PM -0700, Darrick J. Wong wrote:
> > > For checking whether the bitmap is in use, it's simpler and more efficient to
> > > just directly check whether vi->hash_block_verified is NULL, as the existing
> > > code does.  Only the code that allocates the bitmap actually needs to be aware
> > > of the details of when the bitmap gets enabled.
> > > 
> > > fsverity_caches_blocks() has a similar issue, where it could just be replaced
> > > with checking vops->read_merkle_tree_block != NULL directly (or equivalently
> > > vops->drop_merkle_tree_block, which works well in
> > > fsverity_drop_merkle_tree_block() since that's the function pointer it's
> > > calling).  The WARN_ON_ONCE() should be done in fsverity_create_info(), not
> > > inlined multiple times into the verification code.
> > 
> > Ok, I'll move the WARN_ON_ONCE there:
> > 
> > 	/*
> > 	 * If the filesystem implementation supplies Merkle tree content on a
> > 	 * per-block basis, it must implement both the read and drop functions.
> > 	 * If it supplies content on a per-page basis, neither should be
> > 	 * provided.
> > 	 */
> > 	if (vops->read_merkle_tree_block)
> > 		WARN_ON_ONCE(vops->drop_merkle_tree_block == NULL);
> > 	else
> > 		WARN_ON_ONCE(vops->drop_merkle_tree_block != NULL);
> 
> Checking ->read_merkle_tree_page would make sense too, right?  I.e. we should
> enforce that one of the following two cases holds:
> 
>     1. read_merkle_tree_page != NULL &&
>        read_merkle_tree_block == NULL &&
>        drop_merkle_tree_block == NULL
> 
>     2. read_merkle_tree_page == NULL &&
>        read_merkle_tree_block != NULL &&
>        drop_merkle_tree_block != NULL

<nod>

	if (vops->read_merkle_tree_page)
		WARN_ON_ONCE(vops->read_merkle_tree_block != NULL ||
			     vops->drop_merkle_tree_block != NULL);
	else
		WARN_ON_ONCE(vops->read_merkle_tree_block == NULL ||
			     vops->drop_merkle_tree_block == NULL);

> > > > +		bytes_to_copy = min_t(u64, end_pos - pos,
> > > > +				      params->block_size - offs_in_block);
> > > > +
> > > > +		err = fsverity_read_merkle_tree_block(inode, &vi->tree_params,
> > > > +				pos - offs_in_block, ra_bytes, &block);
> > > > +		if (err) {
> > > >  			fsverity_err(inode,
> > > > -				     "Error %d reading Merkle tree page %lu",
> > > > -				     err, index);
> > > > +				     "Error %d reading Merkle tree block %llu",
> > > > +				     err, pos);
> > > >  			break;
> > > 
> > > The error message should go into fsverity_read_merkle_tree_block() so that it
> > > does not need to be duplicated in its two callers.  This would, additionally,
> > > eliminate the need to introduce the 'err' variable in verify_data_block().
> > 
> > Do you want that to be a separate cleanup patch?
> 
> I think it makes sense to do it in this patch, since this patch introduces
> fsverity_read_merkle_tree_block().  This patch also adds the 'err' variable back
> to verify_data_block(), which may mislead people into thinking it's the actual
> error code that users see (this has happened before) --- that could be avoided.

Let's rename it "bad" in verify.c then.  I'm assuming that you /do/ want
the error code in fsverity_read_merkle_tree?

> > > > +int fsverity_read_merkle_tree_block(struct inode *inode,
> > > > +				    const struct merkle_tree_params *params,
> > > > +				    u64 pos, unsigned long ra_bytes,
> > > > +				    struct fsverity_blockbuf *block)
> > > > +{
> > > > +	const struct fsverity_operations *vops = inode->i_sb->s_vop;
> > > > +	unsigned long page_idx;
> > > > +	struct page *page;
> > > > +	unsigned long index;
> > > > +	unsigned int offset_in_page;
> > > > +
> > > > +	if (fsverity_caches_blocks(inode)) {
> > > > +		block->verified = false;
> > > > +		return vops->read_merkle_tree_block(inode, pos, ra_bytes,
> > > > +				params->log_blocksize, block);
> > > > +	}
> > > > +
> > > > +	index = pos >> params->log_blocksize;
> > > 
> > > Should the fourth parameter to ->read_merkle_tree_block be the block index
> > > (which is computed above) instead of log_blocksize?  XFS only uses
> > > params->log_blocksize to compute the block index anyway.
> > 
> > I don't know.  XFS is infamous for everyone having a different opinion
> > and developers being unable to drive a consensus and being able to get a
> > patchset to completion.  So:
> > 
> > Andrey wrote an implementation that used the buffer cache and used block
> > indexes to load/store from the xattr structure.  I didn't like that
> > because layering violation.
> > 
> > willy suggested hanging an xarray off struct xfs_inode and using that to
> > cache merkle tree blocks.  For that design, we want the xarray indexes
> > in units of blocks to conserve space, which means passing in (pos,
> > log_blocksize) or a direct block index.
> > 
> > Christoph then suggested using a per-AG rhashtable to reduce per-inode
> > memory overhead, in which case the lookup key can be either (inumber,
> > pos) or (inumber, block index).  This is a better design if there aren't
> > really going to be that many xfs inodes with verity enabled, though we
> > lose per-inode sharding of the merkle blocks.
> > 
> > Dave thinks that verity inodes could constitute the majority of xfs
> > inodes some day and it should be in the inode instead.
> > 
> > Andrey and I do not have crystal balls, we have no idea how this
> > dchinner/hch disagreement will play out.  Earlier I got the sense that
> > you wanted to move towards expressing all the merkle tree info in units
> > of bytes.
> > 
> > In a lot of ways it would be preferable to move to block indexes
> > instead, since there's never going to be a meaningful access to merkle
> > position 1337.  But you're the maintainer, so it's up to you. :)
> 
> It makes sense to cache the blocks by index regardless of whether you get passed
> the index directly.  So this is just a question of what the
> ->read_merkle_tree_block function prototype should be.
> 
> Currently the other fsverity_operations functions are just aligned with what the
> filesystems actually need: ->read_merkle_tree_page takes a page index, whereas
> ->write_merkle_tree_block takes a pos and size.  xfs_fsverity_read_merkle()
> needs pos, index, and size, and fsverity_read_merkle_tree_block() computes the
> index already.  So that's why I'd probably go with pos, index, and size instead
> of pos, log2_blocksize, and size.  There's no right answer though.

How about just index and size?  I think XFS will work just fine without
the byte position since it's using that as a key for the cache and the
ondisk xattrs anyway.  Even if we can't decide on which data structure
to use for caching. ;)

--D

> 
> - Eric
>