I am planning to implement backup and restore for fscrypt files and
directories and propose the following design, and would welcome feedback
on this approach.
There is a need to preserve encrypted file data in case of storage
failure and to allow safely moving the data between filesystems and
systems without decrypting it, just like we would do for normal files.
While backup and restore at the device level is sometimes an option, we
need to also be able to carry out back/restore at the ext4 file system
level, for instance to allow changing formatting options.
The core principle we want to retain is that we must not make any clear
text copy of encrypted files. This means backup/restore must be carried
out without the encryption key.
The first challenge we have to address is to get access to raw encrypted
files without the encryption key. By design, fscrypt does not allow such
kind of access, and the ext4 file system would not let read or write
files flagged as encrypted if the encryption key is not provided. This
restriction is not for security reasons, but to avoid applications
accidentally accessing the ciphertext. A mechanism must be provided for
access to both raw encrypted content, and raw encrypted names.
The second challenge is to deal with the encrypted file's size, when it
is accessed with the encryption key vs. when accessed without the
encryption key. For the backup operation to retrieve full encrypted
content, the encrypted file size should be reported as a multiple of the
encryption chunk size when the encryption key is not present. And the
clear text file size (size as seen with the encryption key) must be
backed up as well in order to properly restore encrypted files later on.
This information cannot be inferred by any other means.
The third challenge is to get access to the encryption context of files
and directories. By design, fscrypt does not expose this information,
internally stored as an extended attribute but with no associated
handler. However, making a backup of the encryption context is crucial
because it preserves the information needed to later decrypt the file
content. And it is also a non-trivial operation to restore the
encryption context. Indeed, fscrypt imposes that an encryption context
can only be set on a new file or an existing but empty directory.
In order to address this need for backup/restore of encrypted files, we
propose to make use of a special extended attribute named
security.encdata, containing:
- encoding method used for binary data. Assume name can be up to 255 chars.
- clear text file data length in bytes (set to 0 for dirs).
- encryption context. 40 bytes for v2 encryption context.
- encrypted name. 256 bytes max.
To improve portability if we need to change the on-disk format in the
future, and to make the archived data useful over a longer timeframe,
the content of the security.encdata xattr is expressed as ASCII text
with a "key: value" YAML format. As encryption context and encrypted
file name are binary, they need to be encoded.
So the content of the security.encdata xattr would be something like:
{ encoding: base64url, size: 3012, enc_ctx: YWJjZGVmZ2hpamtsbW
5vcHFyc3R1dnd4eXphYmNkZWZnaGlqa2xtbg, enc_name: ZmlsZXdpdGh2ZX
J5bG9uZ25hbWVmaWxld2l0aHZlcnlsb25nbmFtZWZpbGV3aXRodmVyeWxvbmdu
YW1lZmlsZXdpdGg }
Because base64 encoding has a 33% overhead, this gives us a maximum
xattr size of approximately 800 characters.
This extended attribute would not be shown when listing xattrs, only
exposed when fetched explicitly, and unmodified tools would not be able
to access the encrypted files in any case. It would not be stored on
disk, only computed when fetched.
File and file system backups often use the tar utility either directly
or under the covers. We propose to modify the tar utility to make it
"encryption aware", but the same relatively small changes could be done
with other common backup utilities like cpio as needed. When detecting
ext4 encrypted files, tar would need to explicitly fetch the
security.encdata extended attribute, and store it along with the backup
file. Fetching this extended attribute would internally trigger in ext4
a mechanism responsible for gathering the required information. Because
we must not make any clear text copy of encrypted files, the encryption
key must not be present. Tar would also need to use a special flag that
would allow reading raw data without the encryption key. Such a flag
could be named O_FILE_ENC, and would need to be coupled with O_DIRECT so
that the page cache does not see this raw data. O_FILE_ENC could take
the value of (O_NOCTTY | O_NDELAY) as they are unlikely to be used in
practice and are not harmful if used incorrectly. The name of the
backed-up file would be the encoded+digested form returned by fscrypt.
The tar utility would be used to extract a previously created tarball
containing encrypted files. When restoring the security.encdata extended
attribute, instead of storing the xattr as-is on disk, this would
internally trigger in ext4 a mechanism responsible for extracting the
required information, and storing them accordingly. Tar would also need
to specify the O_FILE_ENC | O_DIRECT flags to write raw data without the
encryption key.
To create a valid encrypted file with proper encryption context and
encrypted name, we can implement a mechanism where the file is first
created with O_TMPFILE in the encrypted directory to avoid triggering
the encryption context check before setting the security.encdata xattr,
and then atomically linking it to the namespace with the correct
encrypted name.
From a security standpoint, doing backup and restore of encrypted files
must not compromise their security. This is the reason why we want to
carry out these operations without the encryption key. It avoids making
a clear text copy of encrypted files.
The security.encdata extended attribute contains the encryption context
of the file or directory. This has a 16-byte nonce (per-file random
value) that is used along with the master key to derive the per-file key
thanks to a KDF function. But the master key is not stored in ext4, so
it is not backed up as part of the scenario described above, which makes
the backup of the raw encrypted files safe.
The process of restoring encrypted files must not change the encryption
context associated with the files. In particular, setting an encryption
context on a file must be possible only once, when the file is restored.
And the newly introduced capability of restoring encrypted files must
not give the ability to set an arbitrary encryption context on files.
From the backup tool point of view, the only changes needed would be to
add "O_FILE_ENC" when the open fails with ENOKEY, and then explicitly
backup the "security.encdata" xattr with the file. On restore, if the
"security.encdata" xattr is present, then the file should be created in
the directory with O_TMPFILE before restoring the xattrs and file data,
and then using link() to link the file to the directory with the
encrypted filename.
From the filesystem point of view, it needs to generate the encdata
xattr on getxattr(), and interpret it correctly on setxattr(). The VFS
needs to allow open() and link() on encrypted files with O_FILE_ENC.
If this proposal is OK I can provide a series of patches to implement this.