On Thu, Sep 14, 2023 at 11:48:16PM -0700, Eric Biggers wrote: > On Thu, Sep 14, 2023 at 08:47:41PM -0400, Josef Bacik wrote: > > Hello, > > > > This is meant as a replacement for the last set of patches Sweet Tea sent [1]. > > This is an attempt to find a different path forward. Strip down everything to > > the basics. Essentially all we appear to need is a nonce, and then we can use > > the inode context to derive per-extent keys. > > > > I'm sending this as an RFC to see if this is a better direction to try and make > > some headway on this project. The btrfs side doesn't change too much, the code > > just needs to be adjusted to use the new helpers for the extent contexts. I > > have this work mostly complete, but I'm afraid I won't have it ready for another > > day or two and I want to get feedback on this ASAP before I burn too much time > > on it. > > > > Additionally there is a callback I've put in the inline block crypto stuff that > > we need in order to handle the checksumming. I made my best guess here as to > > what would be the easiest and simplest way to acheive what we need, but I'm open > > to suggestions here. > > > > The other note is I've disabled all of the policy variations other than default > > v2 policies if you enable extent encryption. This is for simplicity sake. We > > could probably make most of it work, but reflink is basically impossible for v1 > > with direct key, and is problematic for the lblk related options. It appears > > this is fine, as those other modes are for specific use cases and the vast > > majority of normal users are encouraged to use normal v2 policies anyway. > > > > This stripped down version gives us most of what we want, we can reflink between > > different inodes that have the same policy. We lose the ability to mix > > differently encrypted extents in the same inode, but this is an acceptable > > limitation for now. > > > > This has only been compile tested, and as I've said I haven't wired it > > completely up into btrfs yet. But this is based on a rough wire up and appears > > to give us everything we need. The btrfs portion of Sweet Teas patches are > > basically untouched except where we use these helpers to deal with the extent > > contexts. Thanks, > > > > Josef > > > > [1] https://lore.kernel.org/linux-fscrypt/cover.1693630890.git.sweettea-kernel@xxxxxxxxxx/ > > > > Josef Bacik (4): > > fscrypt: rename fscrypt_info => fscrypt_inode_info > > fscrypt: add per-extent encryption support > > fscrypt: disable all but standard v2 policies for extent encryption > > blk-crypto: add a process bio callback > > > > block/blk-crypto-fallback.c | 18 ++++ > > block/blk-crypto-profile.c | 2 + > > block/blk-crypto.c | 6 +- > > fs/crypto/crypto.c | 23 +++-- > > fs/crypto/fname.c | 6 +- > > fs/crypto/fscrypt_private.h | 78 ++++++++++++---- > > fs/crypto/hooks.c | 2 +- > > fs/crypto/inline_crypt.c | 50 +++++++++-- > > fs/crypto/keyring.c | 4 +- > > fs/crypto/keysetup.c | 174 ++++++++++++++++++++++++++++++++---- > > fs/crypto/keysetup_v1.c | 14 +-- > > fs/crypto/policy.c | 45 ++++++++-- > > include/linux/blk-crypto.h | 9 +- > > include/linux/fs.h | 4 +- > > include/linux/fscrypt.h | 41 ++++++++- > > 15 files changed, 400 insertions(+), 76 deletions(-) > > Thanks Josef! At a high level this looks good to me. It's much simpler. I > guess my main question is "what is missing" (besides the obvious things like > updating the documentation and polishing code comments). I see you got rid of a > lot of the complexity in Sweet Tea's patchset, which is great as I think a lot > of it was unnecessary as I've mentioned. But maybe something got overlooked? > I'm mainly wondering about the patches like "fscrypt: allow asynchronous info > freeing" that were a bit puzzling but have now gone away. > I'm going to fix this in a different way internally in btrfs. There's only once place where we can't drop the lock, so I plan to just collate free'd EM's and free them in bulk after we drop the lock. This *may* not fix the problem, I have to wait for lockdep to tell me I'm stupid and I missed some other dependency, but if that's the case I'll just async free our EM's, that way I can synchronize dropping the objects at inode drop time to avoid unpleasant timing issues. > Not supporting v1 encryption policies is the right call, I think. xfstests will > need to be updated to not assume that v1 is always supported, but that's > something I've been thinking about doing anyway. > > The patch that adds support for checksumming the on-disk data is new. I see why > it's needed. I suppose that's just been overlooked until now? It's definitely > correct that you need to checksum the ciphertext, not the plaintext. Otherwise > the checksums leak information about the plaintext. > I think Sweet Tea was leaving this as a followup exercise, but I'd rather have everything working out of the box, and since my patchset was a lot simpler I figured I'd give you more opportunities to yell at me for something. > Did you consider the idea I mentioned at the end of > https://lore.kernel.org/r/20230907055233.GB37146@sol.localdomain where we store > a full fscrypt_context per extent, but for now validate that it matches the > inode's context (minus the nonce) and only support that case? > I missed this bit of feedback, I like it a lot actually because if we do decide later to tackle key changing it would be nice if existing file systems had everything they needed. I think I'll still keep the slimmed down extent context and just throw the master key in there and the encrytion type, and then do as you say and validate it matches the inode. > I guess the reasons to do that would be (1) futureproofing, (2) error checking > to catch any bugs where an extent might be accessed inconsistently, and (3) > making extents "standalone" so that they can be decrypted by anything that > iterates through the extents only (e.g. btrfs scrub as mentioned by Sweet Tea; > though, how will scrub even have access to the encryption keys?). I don't have > a great sense of how strong these reasons actually are, so any thoughts would be > appreciated. If just the nonce is really all that's needed, then that's fine > too. The point is, the part I was concerned about wasn't really whether the key > identifier and encryption settings get stored per extent, but rather whether we > actually support the case where these differ from the inode's. Scrub just needs to read the raw bytes and validate the checksums. This is my other motivation for doing the checksum thing now, I want to make sure things like scrub work out of the box since that's one of the "tricky" parts when it comes to having data it can't actually derypt. The key identifier is mostly around being able to simply support different keys in the same inode, primarly for re-keying. There is some talk of people wanting to have differently encrypted subvolumes and reflink between them, and having this information per-extent would make that a lot easier. For now I've told those people to kick rocks, but it'll be nice to be setup to add this support later on. > > (And by "the inode" I really mean "the inode that owns the extent cache entry > the extent is being accessed through". It's the case that when an extent is > shared by multiple inodes, it gets a cache entry for each one, right?) > Yup the cache entry gets loaded from the extent itself, so we'll have duplicate cache entries since they're tied to the inode. It's the same for our extent maps as well, since you could at any point overwrite part of it and need to update the in-memory mapping. Thanks for the quick feedback, I'm going to finish making the appropriate changes to the btrfs patches, make sure everything works. I'll clean up these patches and make the changes you suggested, and I'll update the documentation. It'll probably be early next week when I send again, I'll do the work for xfstests so everything works in this v2 only world. Josef