Hi Andreas, On Mon, Nov 05, 2018 at 02:05:24PM -0700, Andreas Dilger wrote: > On Nov 1, 2018, at 4:52 PM, Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > > > > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > > > Add basic fs-verity support to ext4. fs-verity is a filesystem feature > > that enables transparent integrity protection and authentication of > > read-only files. It uses a dm-verity like mechanism at the file level: > > a Merkle tree is used to verify any block in the file in log(filesize) > > time. It is implemented mainly by helper functions in fs/verity/. > > See Documentation/filesystems/fsverity.rst for details. > > > > This patch adds everything except the data verification hooks that will > > needed in ->readpages(). > > > > On ext4, enabling fs-verity on a file requires that the filesystem has > > the 'verity' feature, e.g. that it was formatted with > > 'mkfs.ext4 -O verity' or had 'tune2fs -O verity' run on it. > > This requires e2fsprogs 1.44.4-2 or later. > > > > In ext4, we choose to retain the fs-verity metadata past the end of the > > file rather than trying to move it into an external inode xattr, since > > in practice keeping the metadata in-line actually results in the > > simplest and most efficient implementation. One non-obvious advantage > > of keeping the verity metadata in-line is that when fs-verity is > > combined with fscrypt, the verity metadata naturally gets encrypted too; > > this is actually necessary because it contains hashes of the plaintext. > > On the plus side, this means that the verity data will automatically be > invalidated if the file is truncated or extended, but on the negative side > it means that the verity Merkle tree needs to be recalculated for the > entire file if e.g. the file is appended to. > > I guess the current implementation will generate the Merkle tree in > userspace, but at some point it might be useful to generate it on-the-fly > to have proper data integrity from the time of write (e.g. like ZFS) > rather than only allowing it to be stored after the entire file is written? > > Storing the Merkle tree in a large xattr inode would allow this to change > in the future rather than being stuck with the current implementation. We > could encrypt the xattr data just as easily as the file data (which should > be done anyway even for non-verity files to avoid leaking data), and having > the verity attr keyed to the inode version/size/mime(?) would ensure the > kernel knows it is stale if the inode is modified. > > I'm not going to stand on my head and block this implementation, I just > thought it is worthwhile to raise these issues now rather than after it > is a fait accompli. > That would actually be the least of the problems for adding write support. Adding write support would require at least: - A way to maintain consistency between the data and hashes, including all levels of hashes, since corruption after a crash (especially of potentially the entire file!) is unacceptable. The main options for solving this are data journalling, copy-on-write, and log-structured volume. But it's very hard to retrofit existing filesystems with new consistency mechanisms. Data journalling can always be used, but is very slow. - An on-disk format that allows dynamically growing/shrinking each level of the Merkle tree; or, using a different authenticated dictionary structure, such as an authenticated skiplist rather than a Merkle tree. This would drastically increase the complexity over a regular Merkle tree. Compare it to dm-verity vs. dm-integrity. dm-verity is read-only and very simple; the kernel just uses a Merkle tree that is generated by userspace. On the other hand, dm-integrity supports writes but is slow, much more complex, and doesn't even actually do full-device authentication since it authenticates each sector independently, i.e. there is no Merkle tree. I don't think it would make sense for the same device-mapper target to support these quite different use cases. And the same general concepts apply at the filesystem level; for these reasons and others (note that per-block checksums like btrfs and ZFS wouldn't need a Merkle tree), write support is very intentionally outside the scope of fs-verity. So I think any arguments for doing things differently in fs-verity need to be made in the context of read-only files. Thanks, Eric