On Tue, Aug 4, 2020 at 11:55 AM Joao Eduardo Luis <joao@xxxxxxx> wrote: > > On 20/08/04 09:04AM, Jason Dillaman wrote: > > On Mon, Aug 3, 2020 at 4:48 PM Joao Eduardo Luis <joao@xxxxxxx> wrote: > > > > > > Even though we currently have at-rest encryption, ensuring data security on the > > > physical device, this is currently on an OSD-basis, and it is too coarse-grained > > > to allow different entities/clients/tenants to have their data encrypted with > > > different keys. > > > > > > The intent here is to allow different tenants to have their data encrypted at > > > rest, independently, and without necessarily relying on full osd encryption. > > > This way one could have anywhere between a handful to dozens or hundreds of > > > tenants with their data encrypted on disk, while not having to maintain full > > > at-rest encryption should the administrator consider it too cumbersome or > > > unnecessary. > > > > I would be interested to hear the tenant use-case where they trust the > > backing storage system (Ceph) with all things encryption and don't > > have any effective control over the keys / ciphers / rotation policies > > / etc. If you have a vulnerability that exposes the current OSD > > dm-crypt keys, I would think it would be possible to get the > > per-namespace keys though a similar vector if they are stored > > effectively side-by-side? > > The idea here was not to berate the current at-rest scheme, nor propose > something as better than, but a use case where it is used instead of. Maybe > I'm being naive, but the trade-off of having the storage system handling the > namespace keys is not much different than having it handling the dmcrypt keys. > > I'm in no way saying that Ceph handling the secrets is better than having the > clients doing their own encryption; it's just a different use case being > addressed. Totally understand. I am just honestly interested if this solves a known issue for Ceph users (regulatory or otherwise). i.e. would it check all the same boxes to implement pool-level encryption vs namespace-level encryption? > > > While there are very good arguments for ensuring this encryption is performed > > > client-side, such that each client actively controls their own secrets, a > > > server-side approach has several other benefits that may outweigh a client-side > > > approach. > > > > > > On the one hand, > > > > > > * encrypting server side means encrypting N times, depending on replication > > > size and scheme; > > > * the secrets keyring will be centralized, likely in the monitor, much like > > > what we do for dmcrypt; even though encrypted. > > > * on-the-wire data will still need to rely on msgr2 encryption; even though > > > one could argue that this will likely happen regardless of whether a client- > > > or server-side approach is being used. > > > > > > But on the other, > > > > > > 1. encryption becomes transparent for the client, avoiding the effort of > > > implementing such schemes in client libraries and kernel drivers; > > > > Just an FYI: krbd supports client-side via dm-crypt, kernel CephFS is > > actively looking to incorporate fscrypt, librbd can utilize > > QEMU-layered LUKS for many use-cases and work is in-progress on > > built-in librbd client-side encryption. RGW has had client-side > > encryption for a while. > > Very much aware. To be frank, this came up mostly to address cephfs > encryption, but the approach seemed generic enough to write it up as such. > > As for cephfs, I've been following the fscrypt discussion, within reason and > what's available in the ticket, and it didn't seem particularly in conflict > with this proposal (other than potential duplication of efforts). > > > > > 2. tighter control over the unit of data being encrypted, reducing the load of > > > encrypting a whole object versus a disk block in bluestore. > > > > RBD client-side encryption doesn't rely on the underlying object size > > (512 bytes for dm-crypt I think and looking at 4KiB blocks for the > > librbd built-in encryption). I can't speak for CephFS+fscrypt, but I > > suspect it wouldn't require re-encrypting the full file or backing > > object (probably 4KiB page). > > TIL. Thanks :) > > > -Joao > > > > > > 3. older clients will be able to support encryption out of the box, given they > > > will have no idea their data is being encrypted, nor on how that is happening. > > > > > > > > > CHOOSING NAMESPACES > > > -------------------- > > > > > > While investigating where and how per-tenant encryption could be implemented, > > > two other ideas were on the table: > > > > > > 1. on a per-client basis, relying on cephx entities, with an encryption key > > > per-client, or a shared key amongst several clients; this key would be kept > > > encrypted in the monitor's kv store with the entity's cephx key. > > > > > > 2. on a per-pool basis. > > > > > > The first one would definitely be feasible, but potentially tricky to > > > implement just right, without too many exceptions or involvement of other > > > portions of the stack. E.g., dealing with metadata could become tricky. Then > > > again, there wasn't one reason that could not be addressed and become a > > > showstopper. > > > > > > As for 2., it would definitely be the easiest to implement: pool is created with > > > an 'encrypted' flag on, key is kept in the monitors, OSDs encrypt any object > > > belonging to that pool. The problem with this option, however, is how > > > coarse-grained it is. If we really wanted a per-tenant approach, one would have > > > to ensure one pool per tenant. Not necessarily a big deal if a lot of > > > potentially small pools is fine. This idea was scrapped in favour of encrypting > > > namespaces instead. > > > > > > Given RADOS already has the concept of a namespace, it might just be the ideal > > > medium to implement such an approach, as we get the best of the two options > > > above: we can get a smaller-grained access than a pool, but still with the same > > > capabilities of limiting access by entity through caps. We also get to have > > > multiple namespaces in a single pool should we choose to do so. All the > > > while, the concept is high-level enough that the effort of implementing the > > > actual encryption scheme might be performed in a select, handful of places, > > > without the need for a lot (maybe, any) particular exceptions or corner cases. > > > > > > > > > APPROACH > > > --------- > > > > > > It is important to note that there are several implementations details, > > > especially on "how exactly this is going to happen", that have not been fully > > > figured out. > > > > > > Essentially, the objective is to ensure that objects from a given namespace are > > > always encrypted or decrypted by bluestore when writing or reading the data. The > > > hope that performing at this level will allow us to > > > > > > 1. ensure the operation is performed at the disk block size, ensuring that > > > small writes, or partial writes, will not require a rewrite of the whole > > > object; same goes for reads. > > > > > > 2. avoid dealing with all the mechanics involving objects and other operations > > > over them, and focus solely on their data and metadata. > > > > > > Secret distribution is expected to be done by the monitors, at the OSDs request. > > > In an ideal world, the OSDs would know exactly which namespaces they might have > > > to encrypt/decrypt, based on pools they currently hold, and request keys for > > > those before hand, such that they don't have to request a key from the monitor > > > when an operation arrives. This would not only require us to become a bit more > > > aware of namespaces, but keeping these keys cached might require the osd to > > > keep them encrypted in memory. What to use for that is something that hasn't > > > been much thought about -- maybe we could get away with using the osd's cephx > > > key. > > > > > > As for the namespaces, in their current form we don't have much (any?) > > > information about them. Access to an object in a namespace is based on prior > > > knowledge of that namespace and the object's name. We currently don't have > > > statistics on namespaces, nor are we able to know whether an OSD keeps any > > > object belonging to a namespace _before_ an operation on such an object is > > > handled. > > > > > > Even though it's not particularly _required_ to get more out of namespaces than > > > we currently have, it would definitely be ideal if we ended up with the ability > > > to 1) have statistics out of namespaces, as it would imperative if we're using > > > them for tenants; and 2) able to cache ahead keys for namespaces an osd might > > > have to handle (read, namespaces living in a pool with PGs mapped to a given > > > osd). > > > > > > _______________________________________________ > > > Dev mailing list -- dev@xxxxxxx > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > > > > > > -- > > Jason > > _______________________________________________ > > Dev mailing list -- dev@xxxxxxx > > To unsubscribe send an email to dev-leave@xxxxxxx > -- Jason _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx