Re: Proposal: Encrypted Namespaces

Jason Dillaman <jdillama@xxxxxxxxxx> · Tue, 4 Aug 2020 13:02:54 -0400

On Tue, Aug 4, 2020 at 11:55 AM Joao Eduardo Luis <joao@xxxxxxx> wrote:
>
> On 20/08/04 09:04AM, Jason Dillaman wrote:
> > On Mon, Aug 3, 2020 at 4:48 PM Joao Eduardo Luis <joao@xxxxxxx> wrote:
> > >
> > > Even though we currently have at-rest encryption, ensuring data security on the
> > > physical device, this is currently on an OSD-basis, and it is too coarse-grained
> > > to allow different entities/clients/tenants to have their data encrypted with
> > > different keys.
> > >
> > > The intent here is to allow different tenants to have their data encrypted at
> > > rest, independently, and without necessarily relying on full osd encryption.
> > > This way one could have anywhere between a handful to dozens or hundreds of
> > > tenants with their data encrypted on disk, while not having to maintain full
> > > at-rest encryption should the administrator consider it too cumbersome or
> > > unnecessary.
> >
> > I would be interested to hear the tenant use-case where they trust the
> > backing storage system (Ceph) with all things encryption and don't
> > have any effective control over the keys / ciphers / rotation policies
> > / etc. If you have a vulnerability that exposes the current OSD
> > dm-crypt keys, I would think it would be possible to get the
> > per-namespace keys though a similar vector if they are stored
> > effectively side-by-side?
>
> The idea here was not to berate the current at-rest scheme, nor propose
> something as better than, but a use case where it is used instead of. Maybe
> I'm being naive, but the trade-off of having the storage system handling the
> namespace keys is not much different than having it handling the dmcrypt keys.
>
> I'm in no way saying that Ceph handling the secrets is better than having the
> clients doing their own encryption; it's just a different use case being
> addressed.

Totally understand. I am just honestly interested if this solves a
known issue for Ceph users (regulatory or otherwise). i.e. would it
check all the same boxes to implement pool-level encryption vs
namespace-level encryption?

> > > While there are very good arguments for ensuring this encryption is performed
> > > client-side, such that each client actively controls their own secrets, a
> > > server-side approach has several other benefits that may outweigh a client-side
> > > approach.
> > >
> > > On the one hand,
> > >
> > > * encrypting server side means encrypting N times, depending on replication
> > >   size and scheme;
> > > * the secrets keyring will be centralized, likely in the monitor, much like
> > >   what we do for dmcrypt; even though encrypted.
> > > * on-the-wire data will still need to rely on msgr2 encryption; even though
> > >   one could argue that this will likely happen regardless of whether a client-
> > >   or server-side approach is being used.
> > >
> > > But on the other,
> > >
> > > 1. encryption becomes transparent for the client, avoiding the effort of
> > >    implementing such schemes in client libraries and kernel drivers;
> >
> > Just an FYI: krbd supports client-side via dm-crypt, kernel CephFS is
> > actively looking to incorporate fscrypt, librbd can utilize
> > QEMU-layered LUKS for many use-cases and work is in-progress on
> > built-in librbd client-side encryption. RGW has had client-side
> > encryption for a while.
>
> Very much aware. To be frank, this came up mostly to address cephfs
> encryption, but the approach seemed generic enough to write it up as such.
>
> As for cephfs, I've been following the fscrypt discussion, within reason and
> what's available in the ticket, and it didn't seem particularly in conflict
> with this proposal (other than potential duplication of efforts).
>
>
> > > 2. tighter control over the unit of data being encrypted, reducing the load of
> > >    encrypting a whole object versus a disk block in bluestore.
> >
> > RBD client-side encryption doesn't rely on the underlying object size
> > (512 bytes for dm-crypt I think and looking at 4KiB blocks for the
> > librbd built-in encryption). I can't speak for CephFS+fscrypt, but I
> > suspect it wouldn't require re-encrypting the full file or backing
> > object (probably 4KiB page).
>
> TIL. Thanks :)
>
>
>   -Joao
>
> >
> > > 3. older clients will be able to support encryption out of the box, given they
> > >    will have no idea their data is being encrypted, nor on how that is happening.
> > >
> > >
> > > CHOOSING NAMESPACES
> > > --------------------
> > >
> > > While investigating where and how per-tenant encryption could be implemented,
> > > two other ideas were on the table:
> > >
> > > 1. on a per-client basis, relying on cephx entities, with an encryption key
> > >    per-client, or a shared key amongst several clients; this key would be kept
> > >    encrypted in the monitor's kv store with the entity's cephx key.
> > >
> > > 2. on a per-pool basis.
> > >
> > > The first one would definitely be feasible, but potentially tricky to
> > > implement just right, without too many exceptions or involvement of other
> > > portions of the stack. E.g., dealing with metadata could become tricky. Then
> > > again, there wasn't one reason that could not be addressed and become a
> > > showstopper.
> > >
> > > As for 2., it would definitely be the easiest to implement: pool is created with
> > > an 'encrypted' flag on, key is kept in the monitors, OSDs encrypt any object
> > > belonging to that pool. The problem with this option, however, is how
> > > coarse-grained it is. If we really wanted a per-tenant approach, one would have
> > > to ensure one pool per tenant. Not necessarily a big deal if a lot of
> > > potentially small pools is fine. This idea was scrapped in favour of encrypting
> > > namespaces instead.
> > >
> > > Given RADOS already has the concept of a namespace, it might just be the ideal
> > > medium to implement such an approach, as we get the best of the two options
> > > above: we can get a smaller-grained access than a pool, but still with the same
> > > capabilities of limiting access by entity through caps. We also get to have
> > > multiple namespaces in a single pool should we choose to do so. All the
> > > while, the concept is high-level enough that the effort of implementing the
> > > actual encryption scheme might be performed in a select, handful of places,
> > > without the need for a lot (maybe, any) particular exceptions or corner cases.
> > >
> > >
> > > APPROACH
> > > ---------
> > >
> > > It is important to note that there are several implementations details,
> > > especially on "how exactly this is going to happen", that have not been fully
> > > figured out.
> > >
> > > Essentially, the objective is to ensure that objects from a given namespace are
> > > always encrypted or decrypted by bluestore when writing or reading the data. The
> > > hope that performing at this level will allow us to
> > >
> > > 1. ensure the operation is performed at the disk block size, ensuring that
> > >    small writes, or partial writes, will not require a rewrite of the whole
> > >    object; same goes for reads.
> > >
> > > 2. avoid dealing with all the mechanics involving objects and other operations
> > >    over them, and focus solely on their data and metadata.
> > >
> > > Secret distribution is expected to be done by the monitors, at the OSDs request.
> > > In an ideal world, the OSDs would know exactly which namespaces they might have
> > > to encrypt/decrypt, based on pools they currently hold, and request keys for
> > > those before hand, such that they don't have to request a key from the monitor
> > > when an operation arrives. This would not only require us to become a bit more
> > > aware of namespaces, but keeping these keys cached might require the osd to
> > > keep them encrypted in memory. What to use for that is something that hasn't
> > > been much thought about -- maybe we could get away with using the osd's cephx
> > > key.
> > >
> > > As for the namespaces, in their current form we don't have much (any?)
> > > information about them. Access to an object in a namespace is based on prior
> > > knowledge of that namespace and the object's name. We currently don't have
> > > statistics on namespaces, nor are we able to know whether an OSD keeps any
> > > object belonging to a namespace _before_ an operation on such an object is
> > > handled.
> > >
> > > Even though it's not particularly _required_ to get more out of namespaces than
> > > we currently have, it would definitely be ideal if we ended up with the ability
> > > to 1) have statistics out of namespaces, as it would imperative if we're using
> > > them for tenants; and 2) able to cache ahead keys for namespaces an osd might
> > > have to handle (read, namespaces living in a pool with PGs mapped to a given
> > > osd).
> > >
> > > _______________________________________________
> > > Dev mailing list -- dev@xxxxxxx
> > > To unsubscribe send an email to dev-leave@xxxxxxx
> > >
> >
> >
> > --
> > Jason
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
>

-- 
Jason
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx