Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Tue, 7 May 2024 14:24:54 -0700

On Wed, May 01, 2024 at 12:23:02AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 30, 2024 at 11:53:00PM -0700, Christoph Hellwig wrote:
> > This and the header hacks suggest to me that shoe horning the fsverity
> > blocks into attrs just feels like the wrong approach.
> > 
> > They don't really behave like attrs, they aren't key/value paris that
> > are separate, but a large amount of same sized blocks with logical
> > indexing.  All that is actually nicely solved by the original fsverity
> > used by ext4/f2fs, while we have to pile workarounds ontop of
> > workarounds to make attrs work.
> 
> Taking this a bit further:  If we want to avoid the problems associated
> with the original scheme, mostly the file size limitation, and the (IMHO
> more cosmetic than real) confusion with post-EOF preallocations, we
> can still store the data in the attr fork, but not in the traditional
> attr format.  The attr fork provides the logical index to physical
> translation as the data fork, and while that is current only used for
> dabtree blocks and remote attr values, that isn't actually a fundamental
> requirement for using it.
> 
> All the attr fork placement works through xfs_bmap_first_unused() to
> find completely random free space in the logic address space.
> 
> Now if we reserved say the high bit for verity blocks in verity enabled
> file systems we can simply use the bmap btree to do the mapping from
> the verity index to the on-disk verify blocks without any other impact
> to the attr code.

Since we know the size of the merkle data ahead of time, we could also
preallocate space in the attr fork and create a remote ATTR_VERITY xattr
named "merkle" that points to the allocated space.  Then we don't have
to have magic meanings for the high bit.

Though I guess the question is, given the format:

struct xfs_attr_leaf_name_remote {
	__be32	valueblk;		/* block number of value bytes */
	__be32	valuelen;		/* number of bytes in value */
	__u8	namelen;		/* length of name bytes */
	/*
	 * In Linux 6.5 this flex array was converted from name[1] to name[].
	 * Be very careful here about extra padding at the end; see
	 * xfs_attr_leaf_entsize_remote() for details.
	 */
	__u8	name[];			/* name bytes */
};

Will we ever have a merkle tree larger than 2^32-1 bytes in length?  If
that's possible, then either we shard the merkle tree, or we have to rev
the ondisk xfs_attr_leaf_name_remote structure.

I think we have to rev the format anyway, since with nrext64==1 we can
have attr fork extents that start above 2^32 blocks, and the codebase
will blindly truncate the 64-bit quantity returned by
xfs_bmap_first_unused.

--D