On Mon, Aug 3, 2020 at 4:48 PM Joao Eduardo Luis <joao@xxxxxxx> wrote: > > > This proposal might be a bit thin on details, but I would love to have some > feedback and gauge the broader community's and developer's interest, as well as > to poke holes in the current idea. > > All comments welcome. > > -Joao > > > > MOTIVATION > ---------- > > Even though we currently have at-rest encryption, ensuring data security on the > physical device, this is currently on an OSD-basis, and it is too coarse-grained > to allow different entities/clients/tenants to have their data encrypted with > different keys. > > The intent here is to allow different tenants to have their data encrypted at > rest, independently, and without necessarily relying on full osd encryption. > This way one could have anywhere between a handful to dozens or hundreds of > tenants with their data encrypted on disk, while not having to maintain full > at-rest encryption should the administrator consider it too cumbersome or > unnecessary. I would be interested to hear the tenant use-case where they trust the backing storage system (Ceph) with all things encryption and don't have any effective control over the keys / ciphers / rotation policies / etc. If you have a vulnerability that exposes the current OSD dm-crypt keys, I would think it would be possible to get the per-namespace keys though a similar vector if they are stored effectively side-by-side? > While there are very good arguments for ensuring this encryption is performed > client-side, such that each client actively controls their own secrets, a > server-side approach has several other benefits that may outweigh a client-side > approach. > > On the one hand, > > * encrypting server side means encrypting N times, depending on replication > size and scheme; > * the secrets keyring will be centralized, likely in the monitor, much like > what we do for dmcrypt; even though encrypted. > * on-the-wire data will still need to rely on msgr2 encryption; even though > one could argue that this will likely happen regardless of whether a client- > or server-side approach is being used. > > But on the other, > > 1. encryption becomes transparent for the client, avoiding the effort of > implementing such schemes in client libraries and kernel drivers; Just an FYI: krbd supports client-side via dm-crypt, kernel CephFS is actively looking to incorporate fscrypt, librbd can utilize QEMU-layered LUKS for many use-cases and work is in-progress on built-in librbd client-side encryption. RGW has had client-side encryption for a while. > 2. tighter control over the unit of data being encrypted, reducing the load of > encrypting a whole object versus a disk block in bluestore. RBD client-side encryption doesn't rely on the underlying object size (512 bytes for dm-crypt I think and looking at 4KiB blocks for the librbd built-in encryption). I can't speak for CephFS+fscrypt, but I suspect it wouldn't require re-encrypting the full file or backing object (probably 4KiB page). > 3. older clients will be able to support encryption out of the box, given they > will have no idea their data is being encrypted, nor on how that is happening. > > > CHOOSING NAMESPACES > -------------------- > > While investigating where and how per-tenant encryption could be implemented, > two other ideas were on the table: > > 1. on a per-client basis, relying on cephx entities, with an encryption key > per-client, or a shared key amongst several clients; this key would be kept > encrypted in the monitor's kv store with the entity's cephx key. > > 2. on a per-pool basis. > > The first one would definitely be feasible, but potentially tricky to > implement just right, without too many exceptions or involvement of other > portions of the stack. E.g., dealing with metadata could become tricky. Then > again, there wasn't one reason that could not be addressed and become a > showstopper. > > As for 2., it would definitely be the easiest to implement: pool is created with > an 'encrypted' flag on, key is kept in the monitors, OSDs encrypt any object > belonging to that pool. The problem with this option, however, is how > coarse-grained it is. If we really wanted a per-tenant approach, one would have > to ensure one pool per tenant. Not necessarily a big deal if a lot of > potentially small pools is fine. This idea was scrapped in favour of encrypting > namespaces instead. > > Given RADOS already has the concept of a namespace, it might just be the ideal > medium to implement such an approach, as we get the best of the two options > above: we can get a smaller-grained access than a pool, but still with the same > capabilities of limiting access by entity through caps. We also get to have > multiple namespaces in a single pool should we choose to do so. All the > while, the concept is high-level enough that the effort of implementing the > actual encryption scheme might be performed in a select, handful of places, > without the need for a lot (maybe, any) particular exceptions or corner cases. > > > APPROACH > --------- > > It is important to note that there are several implementations details, > especially on "how exactly this is going to happen", that have not been fully > figured out. > > Essentially, the objective is to ensure that objects from a given namespace are > always encrypted or decrypted by bluestore when writing or reading the data. The > hope that performing at this level will allow us to > > 1. ensure the operation is performed at the disk block size, ensuring that > small writes, or partial writes, will not require a rewrite of the whole > object; same goes for reads. > > 2. avoid dealing with all the mechanics involving objects and other operations > over them, and focus solely on their data and metadata. > > Secret distribution is expected to be done by the monitors, at the OSDs request. > In an ideal world, the OSDs would know exactly which namespaces they might have > to encrypt/decrypt, based on pools they currently hold, and request keys for > those before hand, such that they don't have to request a key from the monitor > when an operation arrives. This would not only require us to become a bit more > aware of namespaces, but keeping these keys cached might require the osd to > keep them encrypted in memory. What to use for that is something that hasn't > been much thought about -- maybe we could get away with using the osd's cephx > key. > > As for the namespaces, in their current form we don't have much (any?) > information about them. Access to an object in a namespace is based on prior > knowledge of that namespace and the object's name. We currently don't have > statistics on namespaces, nor are we able to know whether an OSD keeps any > object belonging to a namespace _before_ an operation on such an object is > handled. > > Even though it's not particularly _required_ to get more out of namespaces than > we currently have, it would definitely be ideal if we ended up with the ability > to 1) have statistics out of namespaces, as it would imperative if we're using > them for tenants; and 2) able to cache ahead keys for namespaces an osd might > have to handle (read, namespaces living in a pool with PGs mapped to a given > osd). > > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx > -- Jason _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx