Hi Eric, On Tue, Apr 04, 2023 at 04:32:24PM -0700, Eric Biggers wrote: > Hi Andrey, > > On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote: > > In case of different Merkle tree block size fs-verity expects > > ->read_merkle_tree_page() to return Merkle tree page filled with > > Merkle tree blocks. The XFS stores each merkle tree block under > > extended attribute. Those attributes are addressed by block offset > > into Merkle tree. > > > > This patch make ->read_merkle_tree_page() to fetch multiple merkle > > tree blocks based on size ratio. Also the reference to each xfs_buf > > is passed with page->private to ->drop_page(). > > > > Signed-off-by: Andrey Albershteyn <aalbersh@xxxxxxxxxx> > > --- > > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++---------- > > fs/xfs/xfs_verity.h | 8 +++++ > > 2 files changed, 66 insertions(+), 16 deletions(-) > > > > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c > > index a9874ff4efcd..ef0aff216f06 100644 > > --- a/fs/xfs/xfs_verity.c > > +++ b/fs/xfs/xfs_verity.c > > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page( > > struct page *page = NULL; > > __be64 name = cpu_to_be64(index << PAGE_SHIFT); > > uint32_t bs = 1 << log_blocksize; > > + int blocks_per_page = > > + (1 << (PAGE_SHIFT - log_blocksize)); > > + int n = 0; > > + int offset = 0; > > struct xfs_da_args args = { > > .dp = ip, > > .attr_filter = XFS_ATTR_VERITY, > > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page( > > .valuelen = bs, > > }; > > int error = 0; > > + bool is_checked = true; > > + struct xfs_verity_buf_list *buf_list; > > > > page = alloc_page(GFP_KERNEL); > > if (!page) > > return ERR_PTR(-ENOMEM); > > > > - error = xfs_attr_get(&args); > > - if (error) { > > - kmem_free(args.value); > > - xfs_buf_rele(args.bp); > > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL); > > + if (!buf_list) { > > put_page(page); > > - return ERR_PTR(-EFAULT); > > + return ERR_PTR(-ENOMEM); > > } > > > > - if (args.bp->b_flags & XBF_VERITY_CHECKED) > > + /* > > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher > > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size != > > + * PAGE SIZE > > + */ > > + for (n = 0; n < blocks_per_page; n++) { > > + offset = bs * n; > > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset)); > > + args.name = (const uint8_t *)&name; > > + > > + error = xfs_attr_get(&args); > > + if (error) { > > + kmem_free(args.value); > > + /* > > + * No more Merkle tree blocks (e.g. this was the last > > + * block of the tree) > > + */ > > + if (error == -ENOATTR) > > + break; > > + xfs_buf_rele(args.bp); > > + put_page(page); > > + kmem_free(buf_list); > > + return ERR_PTR(-EFAULT); > > + } > > + > > + buf_list->bufs[buf_list->buf_count++] = args.bp; > > + > > + /* One of the buffers was dropped */ > > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED)) > > + is_checked = false; > > + > > + memcpy(page_address(page) + offset, args.value, args.valuelen); > > + kmem_free(args.value); > > + args.value = NULL; > > + } > > I was really hoping for a solution where the cached data can be used directly, > instead allocating a temporary page and copying the cached data into it every > time the cache is accessed. The problem with what you have now is that every > time a single 32-byte hash is accessed, a full page (potentially 64KB!) will be > allocated and filled. That's not very efficient. The need to allocate the > temporary page can also cause ENOMEM (which will get reported as EIO). > > Did you consider alternatives that would work more efficiently? I think it > would be worth designing something that works properly with how XFS is planned > to cache the Merkle tree, instead of designing a workaround. > ->read_merkle_tree_page was not really designed for what you are doing here. > > How about replacing ->read_merkle_tree_page with a function that takes in a > Merkle tree block index (not a page index!) and hands back a (page, offset) pair > that identifies where the Merkle tree block's data is located? Or (folio, > offset), I suppose. > > With that, would it be possible to directly return the XFS cache? > > - Eric > Yeah, I also don't like it, I didn't want to change fs-verity much so went with this workaround. But as it's ok, I will look into trying to pass xfs buffers to fs-verity without direct use of ->read_merkle_tree_page(). I think it's possible with (folio, offset), the xfs buffers aren't xattr value align so the 4k merkle tree block is stored in two pages. Thanks for suggestion! -- - Andrey