On Mon, Oct 25, 2021 at 12:49:51PM -0700, Eric Biggers wrote: > On Thu, Oct 21, 2021 at 11:34:19AM -0700, Omar Sandoval wrote: > > Hello, > > > > I've been working on adding fscrypt support to Btrfs. Btrfs has some > > features (namely, reflinks and snapshots) that don't work well with the > > existing fscrypt encryption policies. I've been discussing and > > prototyping how to support these Btrfs features with fscrypt, so I > > figured it was high time I write it down and loop in the fscrypt > > developers as well. > > > > Here is the Google Doc: > > https://docs.google.com/document/d/1iNnrqyZqJ2I5nfWKt7cd1T9xwU0iHhjhk9ALQW3XuII/edit?usp=sharing > > > > Please feel free to comment there or via email. > > > > Just some preliminary comments: > > Given that you need reflinking to remain supported, for file contents encryption > I think it's the right choice to store the IVs explicitly rather than have them > determined by the offset within the file. > > How many derived encryption keys to use is somewhat orthogonal to that. As I > mentioned in my other mail, you could still have one key per extent rather than > one per encryption policy as you're proposing. I'm *guessing* it wouldn't be > practical, and I don't consider it to be required (just preferable), but the > document doesn't discuss this possibility at all. I overlooked this option because my gut instinct was that the memory usage would be prohibitive. It looks like one AES-256-XTS prepared key is about 1k in memory (960 bytes for the encryption and decryption key schedules for each key, plus a bit more for the crypto API structures). I thought it'd be too expensive to store this naively for each cached extent. However, across various machines I checked, the number of cached inodes and the number of cached extents is in the same order magnitude (and in fact, almost equal in many cases). So per-extent keys aren't out of the question. We can store a 16-byte nonce in the extent, use that to derive the per-extent key from the master key, and use the offset in the extent as the IV. I'll think about it some more and make sure I'm not missing anything. > Storing just the "starting IV" for each extent also makes sense, assuming that > you only want to support an unauthenticated mode such as AES-XTS. However, > given that btrfs is a copy-on-write filesystem and thus can support per-block > metadata, a natural question is why not support an authenticated mode such as > AES-GCM, with a nonce and authentication tag stored per block? Have you thought > about this? > > Now, I personally think that authenticating file contents only wouldn't give > much benefit, and whole-filesystem authentication would be needed to get a real > benefit. But "why aren't you using an authenticated mode" is a *very* common > question, so you need an answer to that -- or ideally, just support it if it > isn't much work. We already store a checksum per block; I don't see any reason that it couldn't be a MAC. Johannes Thumshirn had a proof of concept for storing an HMAC for all blocks: https://lore.kernel.org/linux-btrfs/20191015121405.19066-1-jthumshirn@xxxxxxx/#b Plumbing it through for authenticated encryption would be a little harder, but probably not by much. > What is your proposal for how filenames encryption would work when the > EXPLICIT_IV flag is used? That doesn't appear to be mentioned. Since there's no such thing as "reflinking" filenames, I think filename encryption can be unchanged, i.e., per-directory encryption keys. (This would probably be the case with per-extent keys for data, as well.) > Finally, the proposal to allow encrypting the changed data of snapshots is a > larger departure from the fscrypt model. I'm still trying to wrap my head > around how that could work. Could you provide any more details about that? > E.g. what metadata would actually be stored on-disk, and how would it be used? > When would things be done in terms of filesystem operations? E.g. let's say I > open a file for writing -- would the encryption key be set up right away, or > would it not happen until I actually write data? On disk, we still only need to store the usual fscrypt context. It will always be present for the top-level of the snapshot. It may or may not be present for any files or directories under that. In memory, we'd store whether the subvolume is encrypted. This would be set when enabling encryption and when caching the subvolume. Since every inode has a reference to the subvolume it is in, and inodes can't move between subvolumes, all we need is a check like: if (IS_ENCRYPTED(inode->subvolume) && !IS_ENCRYPTED(inode)) set_up_encryption(inode); I'm leaning towards doing that either at the time that userspace writes the data, or at the time that we're flushing the data to disk, whichever ends up being more convenient for Btrfs. I'd rather not do it at open time. Thanks for the very helpful reply!