[starting a separate thread to not hijack the fs-verity submission] Eric Biggers wrote: > In theory it would be a much cleaner design to store verity metadata > separately from the data. But the Merkle tree can be very large. > For example, a 1 GB file using SHA-512 would have a 16.6 MB Merkle tree. > So the Merkle tree can't be an extended attribute, since the xattrs API > requires xattrs to be small (<= 64 KB), and most filesystems further limit > xattr sizes in their on-disk format to as little as 4 KB. Furthermore, > even if both of these limits were to be increased, the xattrs functions > (both the syscalls, and the internal functions that filesystems have) > are all based around getting/setting the entire xattr value. > > Also when used with fscrypt, we want the Merkle tree and > fsverity_descriptor to be encrypted, so they doesn't leak plaintext > hashes. And we want the Merkle tree to be paged into memory, just like > the file contents, to take advantage of the usual Linux memory management. > > What we really need is *streams*, like NTFS has. But the filesystems > we're targetting don't support streams, nor does the Linux syscall > interface have any API for accessing streams, nor does the VFS support > them. > > Adding streams support to all those things would be a huge multi-year > effort, controversial, and almost certainly not worth it just for > fs-verity. There are, of course, other clients for file streams. Samba is one, GNOME could use streams for various desktoppy things, and I'm certain other users would come out of the woodwork if we had them. Let's go over the properties of a file stream: - It has no life independent of the file it's attached to; you can't move it from one file to another - If the file is deleted, it is also deleted - If the file is renamed, it travels with the file - If the file is copied, the copying program decides whether any named streams are copied along with it. - Can be created, deleted. Can be renamed? - Openable, seekable, cachable - Does not have sub-streams of its own - Directories may also have streams which are distinct from the files in the directory - Can pipes / sockets / device nodes / symlinks / ... have streams? Unclear. Probably not useful. NTFS, UDF and SMB all support streams already. Microsoft opted to include the functionality in ReFS (which dropped some of the less-used functionality of NTFS), so it's clearly useful. Here's my proposed syscall API for this: openat() To access a named stream, we need to be able to get a file descriptor for it. The new openat() syscall seems like the best way to accompish this; specify a file descriptor, a new AT_NAMED_STREAM flag and a filename, and the last component of the filename will be treated as the name of the stream within the object. This permits us to distinguish between a named stream on a directory and a file within a directory. fstat() st_ino may be different for different names. st_dev may be different. st_mode will match the object for files, even if it is changed after creation. For directories, it will match except that execute permission will be removed and S_IFMT will be S_ISREG (do we want to define a new S_ISSTRM?). st_nlink will be 1. st_uid and st_gid will match. It will have its own st_atime/st_mtime/st_ctime. Accessing a stream will not update its parent's atime/mtime/ctime. mmap(), read(), write(), close(), splice(), sendfile(), fallocate(), ftruncate(), dup(), dup2(), dup3(), utimensat(), futimens(), select(), poll(), lseek(), fcntl(): F_DUPFD, F_GETFD, F_GETFL, F_SETFL, F_SETLK, F_SETLKW, F_GETLK, F_GETOWN, F_SETOWN, F_GETSIG, F_SETSIG, F_SETLEASE, F_GETLEASE) These system calls work as expected linkat(), symlinkat(), mknodat(), mkdirat(), These system calls will return -EPERM. renameat() If olddirfd + oldpath refers to a stream then newdirfd + newpath must refer to a stream within the same parent object. If that stream exists, it is removed. If olddirfd + oldpath does not refer to a stream, then newdirfd + newpath must not refer to a stream. The two file specifications must resolve to the same parent object. It is possible to use renameat() to rename a stream within an object, but not to move a stream from one object to another. If newpath refers to an existing named stream, it is removed. unlinkat() This is how you remove an individual named stream unlink() Unlinking a file with named streams removes all named streams from that file and then unlinks the file. Open streams will continue to exist in the filesystem until they are closed, just as unlinked files do. link(), rename() Renaming or linking to a file with named streams does not affect the streams. We may need a new system call for enumerating the streams associated with a file or directory. We can't use getdents() because there's no way to distinguish between wanting to read the contents of a directory and the named streams on a directory. For shell programming, I would suggest a new program: strcat [FILE] [STREAM]... which opens [FILE], then each named stream within that file, concatenating said STREAMs to stdout. We probably need a strls too.