There is a need to preserve encrypted file data in case of storage failure and to allow safely moving the data between filesystems and systems without decrypting it, just like we would do for normal files. While backup and restore at the device level is sometimes an option, we need to also be able to carry out back/restore at the ext4 file system level, for instance to allow changing formatting options.
The core principle we want to retain is that we must not make any clear text copy of encrypted files. This means backup/restore must be carried out without the encryption key.
The first challenge we have to address is to get access to raw encrypted files without the encryption key. By design, fscrypt does not allow such kind of access, and the ext4 file system would not let read or write files flagged as encrypted if the encryption key is not provided. This restriction is not for security reasons, but to avoid applications accidentally accessing the ciphertext. A mechanism must be provided for access to both raw encrypted content, and raw encrypted names.
The second challenge is to deal with the encrypted file's size, when it is accessed with the encryption key vs. when accessed without the encryption key. For the backup operation to retrieve full encrypted content, the encrypted file size should be reported as a multiple of the encryption chunk size when the encryption key is not present. And the clear text file size (size as seen with the encryption key) must be backed up as well in order to properly restore encrypted files later on. This information cannot be inferred by any other means.
The third challenge is to get access to the encryption context of files and directories. By design, fscrypt does not expose this information, internally stored as an extended attribute but with no associated handler. However, making a backup of the encryption context is crucial because it preserves the information needed to later decrypt the file content. And it is also a non-trivial operation to restore the encryption context. Indeed, fscrypt imposes that an encryption context can only be set on a new file or an existing but empty directory.
In order to address this need for backup/restore of encrypted files, we propose to make use of a special extended attribute named security.encdata, containing:
- encoding method used for binary data. Assume name can be up to 255 chars. - clear text file data length in bytes (set to 0 for dirs). - encryption context. 40 bytes for v2 encryption context. - encrypted name. 256 bytes max.To improve portability if we need to change the on-disk format in the future, and to make the archived data useful over a longer timeframe, the content of the security.encdata xattr is expressed as ASCII text with a "key: value" YAML format. As encryption context and encrypted file name are binary, they need to be encoded.
So the content of the security.encdata xattr would be something like: { encoding: base64url, size: 3012, enc_ctx: YWJjZGVmZ2hpamtsbW 5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbg, enc_name: ZmlsZXdpdGh2ZX J5bG9uZ25hbWVmaWxld2l0aHZlcnlsb25nbmFtZWZpbGV3aXRodmVyeWxvbmdu YW1lZmlsZXdpdGg }Because base64 encoding has a 33% overhead, this gives us a maximum xattr size of approximately 800 characters. This extended attribute would not be shown when listing xattrs, only exposed when fetched explicitly, and unmodified tools would not be able to access the encrypted files in any case. It would not be stored on disk, only computed when fetched.
File and file system backups often use the tar utility either directly or under the covers. We propose to modify the tar utility to make it "encryption aware", but the same relatively small changes could be done with other common backup utilities like cpio as needed. When detecting ext4 encrypted files, tar would need to explicitly fetch the security.encdata extended attribute, and store it along with the backup file. Fetching this extended attribute would internally trigger in ext4 a mechanism responsible for gathering the required information. Because we must not make any clear text copy of encrypted files, the encryption key must not be present. Tar would also need to use a special flag that would allow reading raw data without the encryption key. Such a flag could be named O_FILE_ENC, and would need to be coupled with O_DIRECT so that the page cache does not see this raw data. O_FILE_ENC could take the value of (O_NOCTTY | O_NDELAY) as they are unlikely to be used in practice and are not harmful if used incorrectly. The name of the backed-up file would be the encoded+digested form returned by fscrypt.
The tar utility would be used to extract a previously created tarball containing encrypted files. When restoring the security.encdata extended attribute, instead of storing the xattr as-is on disk, this would internally trigger in ext4 a mechanism responsible for extracting the required information, and storing them accordingly. Tar would also need to specify the O_FILE_ENC | O_DIRECT flags to write raw data without the encryption key.
To create a valid encrypted file with proper encryption context and encrypted name, we can implement a mechanism where the file is first created with O_TMPFILE in the encrypted directory to avoid triggering the encryption context check before setting the security.encdata xattr, and then atomically linking it to the namespace with the correct encrypted name.
From a security standpoint, doing backup and restore of encrypted files must not compromise their security. This is the reason why we want to carry out these operations without the encryption key. It avoids making a clear text copy of encrypted files. The security.encdata extended attribute contains the encryption context of the file or directory. This has a 16-byte nonce (per-file random value) that is used along with the master key to derive the per-file key thanks to a KDF function. But the master key is not stored in ext4, so it is not backed up as part of the scenario described above, which makes the backup of the raw encrypted files safe. The process of restoring encrypted files must not change the encryption context associated with the files. In particular, setting an encryption context on a file must be possible only once, when the file is restored. And the newly introduced capability of restoring encrypted files must not give the ability to set an arbitrary encryption context on files.
From the backup tool point of view, the only changes needed would be to add "O_FILE_ENC" when the open fails with ENOKEY, and then explicitly backup the "security.encdata" xattr with the file. On restore, if the "security.encdata" xattr is present, then the file should be created in the directory with O_TMPFILE before restoring the xattrs and file data, and then using link() to link the file to the directory with the encrypted filename.
From the filesystem point of view, it needs to generate the encdata xattr on getxattr(), and interpret it correctly on setxattr(). The VFS needs to allow open() and link() on encrypted files with O_FILE_ENC.
If this proposal is OK I can provide a series of patches to implement this.