Re: [RFC] Directly mapped xattr data & fs-verity

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 9 Jan 2025 09:03:56 -0800

On Thu, Jan 09, 2025 at 08:44:03AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 08, 2025 at 11:39:08PM -0800, Darrick J. Wong wrote:
> > > > 
> > > > Maybe we can used it for $HANDWAVE is not a good idea. 
> > > 
> > > > Hash based verification works poorly for mutable files, so we'd
> > > > rather have a really good argument for that.
> > > 
> > > hmm, why? Not sure I have an understanding of this
> > 
> > Me neither.  I can see how you might design file data block checksumming
> > to be basically an array of u32 crc[nblocks][2].  Then if you turned on
> > stable folios for writeback, the folio contents can't change so you can
> > compute the checksum of the new data, run a transaction to set
> > crc[nblock][0] to the old checksum; crc[nblock][1] to the new checksum;
> > and only then issue the writeback bio.
> 
> Are you (plural) talking about hash based integrity protection ala
> fsverity or checksums.  While they look similar in some way those are
> totally different things!  If we're talking about "simple" data
> checksums both post-EOF data blocks and xattrs are really badly wrong,
> as the checksum need to be assigned with the physical block due to
> reflinks, not the file.  The natural way to implement them for XFS
> if we really wanted them would be a new per-AG/RTG metabtree that
> is indexed by the agblock/rgblock.

Agreed.  For simple things like crc32 I would very much rather we stuff
them in a per-group btree because we only have to store the crc once in
the filesystem and now it protects all owners of that block.  In theory
the double-crc scheme would work fine for untorn data block writes, I
think.

I only see a reason for per-file hash structures in the dabtree if the
hashes themselves have some sort of per-file configuration (like
distributor-signed merkle trees or whatever).  I asked Eric Biggers if
he had any plans for mutable fsverity files and he said no.

> > But I don't think that works if you crash.  At least one of the
> > checksums might be right if the device doesn't tear the write, but that
> > gets us tangled up in the untorn block writes patches.  If the device
> > does not guarantee untorn writes, then you probably have to do it the
> > way the other checksumming fses do it -- write to a new location, then
> > run a transaction to store the checksum and update the file mapping.
> 
> Yes.  That's why for data checksums you'd always need to either write
> out of place (as with the pending zoned allocator) or work with intent /
> intent done items.  That's assuming you can't offload the atomicy to the
> device by uisng T10 PI or at least per-block metadata that stores the
> checksum.  Which would also remove the need for any new file system
> data struture, but require enterprise hardware that supports PI or
> metadata.

<nod>

--D