Re: [PATCH 18/26] xfs: use merkle tree offset as attr hash

Eric Biggers <ebiggers@xxxxxxxxxx> · Wed, 22 May 2024 11:29:00 -0700

On Wed, May 22, 2024 at 07:37:23AM -0700, Christoph Hellwig wrote:
> On Mon, May 20, 2024 at 09:02:59AM -0700, Darrick J. Wong wrote:
> > On Mon, May 20, 2024 at 05:39:59AM -0700, Christoph Hellwig wrote:
> > > On Fri, May 17, 2024 at 10:17:20AM -0700, Darrick J. Wong wrote:
> > > > >   Note that the verity metadata *must* be encrypted when the file is,
> > > > >   since it contains hashes of the plaintext data.
> > > > 
> > > > Refresh my memory of fscrypt -- does it encrypt directory names, xattr
> > > > names, and xattr values too?  Or does it only do that to file data?
> > > 
> > > It does encrypt the file names in the directories, but nothing in
> > > xattrs as far as I can tell.
> > 
> > Do we want that for user.* attrs?  That seems like quite an omission.
> 
> I'll let Eric answer that.  Btw, is the threat model for fscrypt written
> down somewhere?

See https://www.kernel.org/doc/html/latest/filesystems/fscrypt.html?highlight=fscrypt#threat-model

As for why it stopped at file contents and names (and fsverity Merkle tree
blocks which ext4 and f2fs encrypt in the same way as contents), it's just
because it's very difficult to add more, and file contents and names alone were
enough for parity with most other file-level encryption systems.  That's just
the nature of file-level encryption.  For each additional type of data or
metadata that's encrypted, there are a huge number of things that need to be
resolved including algorithm selection, key derivation, IV selection, on-disk
format, padding, UAPI for enabling the feature, userspace tool support including
fsck and debugging tools, access semantics without the key, etc...

xattr encryption is definitely something that people have thought about, and it
probably would be the next thing to consider after the existing contents and
names.  Encrypting the exact file sizes is also something to consider.  But it's
not something that someone has volunteered to do all the work for (yet).  If you
restricted it to the contents of user xattrs only (not other xattrs, and not
xattr names), it would be more feasible than trying to encrypt the names and
values of all xattrs, though it would still be difficult.

Of course, generally speaking, when adding fscrypt support to a filesystem, it's
going to be much easier to just target the existing feature set, and not try to
include new, unproven features too.  (FWIW, this also applies to fsverity.)  If
someone is interested in taking on an experimental project add xattr encryption
support, I'd be glad to try to provide guidance, but it probably should be
separated out from adding fscrypt support in the first place.

> > > > And if we copy the ext4 method of putting the merkle data after eof and
> > > > loading it into the pagecache, how much of the generic fs/verity cleanup
> > > > patches do we really need?
> > > 
> > > We shouldn't need anything.  A bunch of cleanup
> > 
> > Should we do the read/drop_merkle_tree_block cleanup anyway?
> 
> To me the block based interface seems a lot cleaner, but Eric has some
> reservations due to the added indirect call on the drop side.

The point of the block based interface is really so that filesystems could
actually do the block-based caching.  If XFS isn't going to end up using that
after all, I think we should just stay with ->read_merkle_tree_page for now.

> > One of the advantages of xfs caching merkle tree blocks ourselves
> > is that we neither extend the usage of PageChecked when merkle blocksize
> > == pagesize nor become subject to the 1-million merkle block limit when
> > merkle blocksize < pagesize.  There's a tripping hazard if you mount a 4k
> > merkle block filesystem on a computer with 64k pages -- now you can't
> > open 6T verity files.
> > 
> > That said, it also sounds dumb to maintain a separate index for
> > pagecache pages to track a single bit.
> 
> Yeah.  As I mentioned earlier I think fsverify really should enforce
> a size limit.  Right now it will simply run out space eventually which
> doesn't seem like a nice failure mode.

You could enforce the limit of 1 << 23 Merkle tree blocks regardless of whether
the block size and page size are equal, if you want.  This probably would need
to be specific to XFS, though, as it would be a breaking change for other
filesystems.

As for ext4 and f2fs failing with EFBIG in the middle of building the Merkle
tree due to it exceeding s_maxbytes, yes that could be detected at the beginning
of building the tree to get the error right away.  This just isn't done
currently because the current way is simpler, and this case isn't really
encountered in practice so there isn't too much reason to optimize it.

- Eric