Re: [PATCH v2 10/12] ext4: add basic fs-verity support

Eric Biggers <ebiggers@xxxxxxxxxx> · Mon, 5 Nov 2018 17:11:42 -0800

Hi Andreas,

On Mon, Nov 05, 2018 at 02:05:24PM -0700, Andreas Dilger wrote:
> On Nov 1, 2018, at 4:52 PM, Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> > 
> > From: Eric Biggers <ebiggers@xxxxxxxxxx>
> > 
> > Add basic fs-verity support to ext4.  fs-verity is a filesystem feature
> > that enables transparent integrity protection and authentication of
> > read-only files.  It uses a dm-verity like mechanism at the file level:
> > a Merkle tree is used to verify any block in the file in log(filesize)
> > time.  It is implemented mainly by helper functions in fs/verity/.
> > See Documentation/filesystems/fsverity.rst for details.
> > 
> > This patch adds everything except the data verification hooks that will
> > needed in ->readpages().
> > 
> > On ext4, enabling fs-verity on a file requires that the filesystem has
> > the 'verity' feature, e.g. that it was formatted with
> > 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it.
> > This requires e2fsprogs 1.44.4-2 or later.
> > 
> > In ext4, we choose to retain the fs-verity metadata past the end of the
> > file rather than trying to move it into an external inode xattr, since
> > in practice keeping the metadata in-line actually results in the
> > simplest and most efficient implementation.  One non-obvious advantage
> > of keeping the verity metadata in-line is that when fs-verity is
> > combined with fscrypt, the verity metadata naturally gets encrypted too;
> > this is actually necessary because it contains hashes of the plaintext.
> 
> On the plus side, this means that the verity data will automatically be
> invalidated if the file is truncated or extended, but on the negative side
> it means that the verity Merkle tree needs to be recalculated for the
> entire file if e.g. the file is appended to.
> 
> I guess the current implementation will generate the Merkle tree in
> userspace, but at some point it might be useful to generate it on-the-fly
> to have proper data integrity from the time of write (e.g. like ZFS)
> rather than only allowing it to be stored after the entire file is written?
> 
> Storing the Merkle tree in a large xattr inode would allow this to change
> in the future rather than being stuck with the current implementation.  We
> could encrypt the xattr data just as easily as the file data (which should
> be done anyway even for non-verity files to avoid leaking data), and having
> the verity attr keyed to the inode version/size/mime(?) would ensure the
> kernel knows it is stale if the inode is modified.
> 
> I'm not going to stand on my head and block this implementation, I just
> thought it is worthwhile to raise these issues now rather than after it
> is a fait accompli.
> 

That would actually be the least of the problems for adding write support.
Adding write support would require at least:

- A way to maintain consistency between the data and hashes, including all
  levels of hashes, since corruption after a crash (especially of potentially
  the entire file!) is unacceptable.  The main options for solving this are data
  journalling, copy-on-write, and log-structured volume.  But it's very hard to
  retrofit existing filesystems with new consistency mechanisms.  Data
  journalling can always be used, but is very slow.

- An on-disk format that allows dynamically growing/shrinking each level of the
  Merkle tree; or, using a different authenticated dictionary structure, such as
  an authenticated skiplist rather than a Merkle tree.  This would drastically
  increase the complexity over a regular Merkle tree.

Compare it to dm-verity vs. dm-integrity.  dm-verity is read-only and very
simple; the kernel just uses a Merkle tree that is generated by userspace.
On the other hand, dm-integrity supports writes but is slow, much more complex,
and doesn't even actually do full-device authentication since it authenticates
each sector independently, i.e. there is no Merkle tree.

I don't think it would make sense for the same device-mapper target to support
these quite different use cases.  And the same general concepts apply at the
filesystem level; for these reasons and others (note that per-block checksums
like btrfs and ZFS wouldn't need a Merkle tree), write support is very
intentionally outside the scope of fs-verity.

So I think any arguments for doing things differently in fs-verity need to be
made in the context of read-only files.

Thanks,

Eric