Hi, I would like to build support for file content encryption in the kAFS filesystem driver in the Linux kernel - but this needs standardising so that other AFS filesystems can make use of it also. Note that by "content encryption", I mean that only the permitted clients have a key to the content. The server does not. Further, filenames may also be encrypted. For the kAFS filesystem, content encryption would be provided by the netfs library. The intention is that netfslib will provide such service to any filesystem that uses it (afs, 9p, cifs and, hopefully soon, ceph) using Linux fscrypt where possible (but not mandatory). netfslib would then store the encrypted content in the local cache also and only decrypt it when it's brought into memory. Now, the way I would envision this working is: (1) Each file is divided into units of 4KiB, each of which is encrypted separately with its own block key. The block key is derived from the file key and the block offset. (2) Unfortunately, AFS does not have anywhere to store additional information for a file, such as xattrs, but the last block must be rounded out to at least the crypto block size and maybe the unit size - and we need to stash the real file size somewhere. There are a number of ways this could be dealt with: (a) Store this extra metadata in a separate file. This has a potential integrity issue if we fail to update that due to EDQUOT/ENOSPC, network loss, etc. (b) Round up the data part of the file to 4KiB and tack on a trailer at the end of file that has the real EOF in it. This the advantages that the trailer and the last block can be updated in a single StoreData RPC and that the real EOF can be encrypted, but the disadvantage that we can't return accurate info with stat() unless we can read (and decrypt) the trailer - and we have to do that in stat(). (c) Stick a fixed-len trailer at the real EOF and just encrypt over part of that. Again, this can be updated in a single StoreData RPC and the real EOF can be calculated by simple subtraction. The trailer only need be one crypto block (say 16 bytes) in size, not the full 4K. (d) Find a hole somewhere in the protocol and the on-server-disk metadata to store a number in the range 0-4095 that is backed up and transferred during a volume release. I suspect this is infeasible. (e) Provide xattr support. Probably also infeasible - though it might help with other things such as stacked filesystem support. (3) Mark a whole volume as being content-encrypted. That is that content encryption is only available on a whole-volume basis unless we can find a way to mark individual vnodes as being encrypted - but this has the same issues as storing the real EOF length. This could be done in a number of ways: (a) A volume flag, passed to the client through the VLDB and the volume server. The flag would need to be passed on to clone volumes and would need to be set at volume creation time or shortly thereafter. This might need a new RPC, say VOLSER.CreateEncryptedVolume, as VOLSER.CreateVolume doesn't seem to offer a way to indicate this, but maybe VOLSER.SetFlags would suffice: you turn it on and everything is suddenly encrypted. (b) Storing a magic file in the root directory of the volume (".afs_encrypted" say) that the client can look for. This file could contain info about the algorithms used and the information about key needed to decrypt it. (4) Encrypt filenames in an encrypted directory. Whilst we could just directly pass encrypted filenames in the protocol as the names are XDR strings with a length count, they can't be stored in the standard AFS directory format as they may include NUL and '/'. I can see two possibilities here: (a) base64 encode the encrypted filenames (using a modified base64 to exclude '/'). This has two disadvantages: it reduces the maximum name length by 3/4 and makes all names longer, reducing the capacity of the directory. (b) Use the key to generate a series of numbers and then use each number to map a character of the filename, being careful to break the range around 0 and 47 so that we can map backwards. This may result in less secure filename encryption than (a) and is trickier to do. (5) Derive file keys by combining a per-volume key with the vnode ID and the uniquifier. Marking files with the 'name' of a specific key could be possible, but again this requires somewhere to store these as discussed in (2). Possibly 'file keys' could be skipped, deriving each block key from: RW vol ID || vnode ID || uniquifier || block pos The cell name cannot be included due to aliasing unless the canonical cell name can be queried. (6) Provide a conditional FS.StoreData RPC that takes a Data Version number as an additional parameter and fails if that doesn't match the current DV. The issue is that even if just a byte is changed, an entire crypto unit must be written and truncation may also have to reencrypt the tail. (And by "fail", I'd prefer if it returned the updated stats rather than simply aborting - but I understand that we really want to close off the data transmission). (7) Though it's not strictly required for this, similar to (6), a conditional FS.FetchData could be useful as well for speculatively reading from a RO clone of a RW volume. Again, rather than failing with an abort, I'd prefer this to return no data and just the updated stats. The client should then check the DV in the updated stats. The simplest way to do this need not involve any changes on the server, though having a conditional store would make it safer. Thanks for your consideration, David
![]() |