Re: [PATCH v5 07/24] fsverity: support block-based Merkle tree caching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 07, 2024 at 01:54:01PM -0800, Darrick J. Wong wrote:
> On Tue, Mar 05, 2024 at 07:56:22PM -0800, Eric Biggers wrote:
> > On Mon, Mar 04, 2024 at 08:10:30PM +0100, Andrey Albershteyn wrote:
> > > diff --git a/fs/verity/fsverity_private.h b/fs/verity/fsverity_private.h
> > > index b3506f56e180..dad33e6ff0d6 100644
> > > --- a/fs/verity/fsverity_private.h
> > > +++ b/fs/verity/fsverity_private.h
> > > @@ -154,4 +154,12 @@ static inline void fsverity_init_signature(void)
> > >  
> > >  void __init fsverity_init_workqueue(void);
> > >  
> > > +/*
> > > + * Drop 'block' obtained with ->read_merkle_tree_block(). Calls out back to
> > > + * filesystem if ->drop_block() is set, otherwise, drop the reference in the
> > > + * block->context.
> > > + */
> > > +void fsverity_drop_block(struct inode *inode,
> > > +			 struct fsverity_blockbuf *block);
> > > +
> > >  #endif /* _FSVERITY_PRIVATE_H */
> > 
> > This should be paired with a helper function that reads a Merkle tree block by
> > calling ->read_merkle_tree_block or ->read_merkle_tree_page as needed.  Besides
> > being consistent with having a helper function for drop, this would prevent code
> > duplication between verify_data_block() and fsverity_read_merkle_tree().
> > 
> > I recommend that it look like this:
> > 
> > int fsverity_read_merkle_tree_block(struct inode *inode, u64 pos,
> > 				    unsigned long ra_bytes,
> > 				    struct fsverity_blockbuf *block);
> > 
> > 'pos' would be the byte position of the block in the Merkle tree, and 'ra_bytes'
> > would be the number of bytes for the filesystem to (optionally) readahead if the
> > block is not yet cached.  I think that things work out simpler if these values
> > are measured in bytes, not blocks.  'block' would be at the end because it's an
> > output, and it can be confusing to interleave inputs and outputs in parameters.
> 
> FWIW I don't really like 'pos' here because that's usually short for
> "file position", which is a byte, and this looks a lot more like a
> merkle tree block number.
> 
> u64 blkno?
> 
> Or better yet use a typedef ("merkle_blkno_t") to make it really clear
> when we're dealing with a tree block number.  Ignore checkpatch
> complaining about typeedefs. :)

My suggestion is for 'pos' to be a byte position, in alignment with the
pagecache naming convention as well as ->write_merkle_tree_block which currently
uses a 'u64 pos' byte position too.

It would also work for it to be a block index, in which case it should be named
'index' or 'blkno' and have type unsigned long.  I *think* that things work out
a bit cleaner if it's a byte position, but I could be convinced that it's
actually better for it to be a block index.

> > How about changing the prototype to:
> > 
> > void fsverity_invalidate_merkle_tree_block(struct inode *inode, u64 pos);
> >
> > Also, is it a kernel bug for the pos to be beyond the end of the Merkle tree, or
> > can it happen in cases like filesystem corruption?  If it can only happen due to
> > kernel bugs, WARN_ON_ONCE() might be more appropriate than an error message.
> 
> I think XFS only passes to _invalidate_* the same pos that was passed to
> ->read_merkle_tree_block, so this is a kernel bug, not a fs corruption
> problem.
> 
> Perhaps this function ought to note that @pos is supposed to be the same
> value that was given to ->read_merkle_tree_block?
> 
> Or: make the implementations return 1 for "reloaded from disk", 0 for
> "still in cache", or a negative error code.  Then fsverity can call
> the invalidation routine itself and XFS doesn't have to worry about this
> part.
> 
> (I think?  I have questions about the xfs_invalidate_blocks function.)

It looks like XFS can invalidate blocks other than the one being read by
->read_merkle_tree_block.

If it really was only a matter of the single block being read, then it would
indeed be simpler to just make it a piece of information returned from
->read_merkle_tree_block.

If the generic invalidation function is needed, it needs to be clearly
documented when filesystems are expected to invalidate blocks.

> > > +
> > > +	/**
> > > +	 * Release the reference to a Merkle tree block
> > > +	 *
> > > +	 * @block: the block to release
> > > +	 *
> > > +	 * This is called when fs-verity is done with a block obtained with
> > > +	 * ->read_merkle_tree_block().
> > > +	 */
> > > +	void (*drop_block)(struct fsverity_blockbuf *block);
> > 
> > drop_merkle_tree_block, so that it's clearly paired with read_merkle_tree_block
> 
> Yep.  I noticed that xfs_verity.c doesn't put them together, which made
> me wonder if the write_merkle_tree_block path made use of that.  It
> doesn't, AFAICT.
> 
> And I think the reason is that when we're setting up the merkle tree,
> we want to stream the contents straight to disk instead of ending up
> with a huge cache that might not all be necessary?

In the current patchset, fsverity_blockbuf isn't used for writes (despite the
comment saying it is).  I think that's fine and that it keeps things simpler.
Filesystems can still cache the blocks that are passed to
->write_merkle_tree_block if they want to.  That already happens on ext4 and
f2fs; FS_IOC_ENABLE_VERITY results in the Merkle tree being in the pagecache.

- Eric




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux