Re: [PATCH 03/18] fsverity: convert verification to use byte instead of page offsets

Eric Biggers <ebiggers@xxxxxxxxxx> · Thu, 2 May 2024 00:42:31 +0000

On Wed, May 01, 2024 at 03:33:03PM -0700, Darrick J. Wong wrote:
> On Wed, May 01, 2024 at 12:33:14AM -0700, Christoph Hellwig wrote:
> > > +	const u64 end_pos = min(pos + length, vi->tree_params.tree_size);
> > > +	struct backing_dev_info *bdi = inode->i_sb->s_bdi;
> > > +	const u64 max_ra_bytes = min((u64)bdi->io_pages << PAGE_SHIFT,
> > > +				     ULONG_MAX);
> > > +	const struct merkle_tree_params *params = &vi->tree_params;
> > 
> > bdi->io_pages is really a VM readahead concept.  I know this is existing
> > code, but can we rething why this is even used here?
> 
> I would get rid of it entirely for the merkle-by-block case, since we'd
> have to walk the xattr tree again just to find the next block.  XFS
> ignores the readahead value entirely.
> 
> I think this only makes sense for the merkle-by-page case, and only
> because ext4 and friends are stuffing the merkle data in the posteof
> parts of the file mapping.
> 
> And even then, shouldn't we figure out the amount of readahead going on
> and only ask for enough readahead of the merkle tree to satisfy that
> readahead?

The existing code is:

                unsigned long num_ra_pages =
                        min_t(unsigned long, last_index - index + 1,
                              inode->i_sb->s_bdi->io_pages);

So it does limit the readahead amount to the amount remaining to be read.

In addition, it's limited to io_pages.  It's possible that's not the best value
to use (maybe it should be ra_pages?), but the intent was to just use a large
readahead size, since this code is doing a fully sequential read.

I do think that the concept of Merkle tree readahead makes sense regardless of
how the blocks are being stored.  Having to go to disk every time a new 4K
Merkle tree block is needed increases read latencies.  It doesn't need to be
included in the initial implementation though.

> > And the returned/passed value should be a kernel pointer to the start
> > of the in-memory copy of the block?
> > to 
> 
> <shrug> This particular callsite is reading merkle data on behalf of an
> ioctl that exports data.  Maybe we want the filesystem's errors to be
> bounced up to userspace?

Yes, I think so.

> > > +static bool is_hash_block_verified(struct inode *inode,
> > > +				   struct fsverity_blockbuf *block,
> > >  				   unsigned long hblock_idx)
> > 
> > Other fsverify code seems to use the (IMHO) much more readable
> > two-tab indentation for prototype continuations, maybe stick to that?
> 
> I'll do that, if Eric says so. :)

My preference is to align continuations with the line that they're continuing:

static bool is_hash_block_verified(struct inode *inode,
				   struct fsverity_blockbuf *block,
				   unsigned long hblock_idx)

> > >
> > >  {
> > > +	struct fsverity_info *vi = inode->i_verity_info;
> > > +	struct page *hpage = (struct page *)block->context;
> > 
> > block->context is a void pointer, no need for casting it.
> 
> Eric insisted on it:
> https://lore.kernel.org/linux-xfs/20240306035622.GA68962@sol.localdomain/

No, I didn't.  It showed up in some code snippets that I suggested, but the
casts originated from the patch itself.  Leaving out the cast is fine with me.

> 
> > > +	for (; level > 0; level--)
> > > +		fsverity_drop_merkle_tree_block(inode, &hblocks[level - 1].block);
> > 
> > Overlh long line here.  But the loop kinda looks odd anyway with the
> > exta one off in the body instead of the loop.
> 
> I /think/ that's a side effect of reusing the value of @level after the
> first loop fails as the initial conditions of the unwind loop.  AFAICT
> it doesn't leak, but it's not entirely straightforward.

When an error occurs either ascending or descending the tree, we end up here
with 'level' containing the number of levels that need to be cleaned up.  It
might be clearer if it was called 'num_levels', though that could be confused
with 'params->num_levels'.  Or we could use: 'while (level-- > 0)'.

This is unrelated to this patch though.

- Eric