Proposal: Encrypted Namespaces

Joao Eduardo Luis <joao@xxxxxxx> · Mon, 3 Aug 2020 20:48:10 +0000

This proposal might be a bit thin on details, but I would love to have some
feedback and gauge the broader community's and developer's interest, as well as
to poke holes in the current idea.

All comments welcome.

  -Joao

MOTIVATION
----------

Even though we currently have at-rest encryption, ensuring data security on the
physical device, this is currently on an OSD-basis, and it is too coarse-grained
to allow different entities/clients/tenants to have their data encrypted with
different keys.

The intent here is to allow different tenants to have their data encrypted at
rest, independently, and without necessarily relying on full osd encryption.
This way one could have anywhere between a handful to dozens or hundreds of
tenants with their data encrypted on disk, while not having to maintain full
at-rest encryption should the administrator consider it too cumbersome or
unnecessary.

While there are very good arguments for ensuring this encryption is performed
client-side, such that each client actively controls their own secrets, a
server-side approach has several other benefits that may outweigh a client-side
approach.

On the one hand,

* encrypting server side means encrypting N times, depending on replication
  size and scheme;
* the secrets keyring will be centralized, likely in the monitor, much like
  what we do for dmcrypt; even though encrypted.
* on-the-wire data will still need to rely on msgr2 encryption; even though
  one could argue that this will likely happen regardless of whether a client-
  or server-side approach is being used.

But on the other,

1. encryption becomes transparent for the client, avoiding the effort of
   implementing such schemes in client libraries and kernel drivers;
2. tighter control over the unit of data being encrypted, reducing the load of
   encrypting a whole object versus a disk block in bluestore.
3. older clients will be able to support encryption out of the box, given they
   will have no idea their data is being encrypted, nor on how that is happening.

CHOOSING NAMESPACES
--------------------

While investigating where and how per-tenant encryption could be implemented,
two other ideas were on the table:

1. on a per-client basis, relying on cephx entities, with an encryption key
   per-client, or a shared key amongst several clients; this key would be kept
   encrypted in the monitor's kv store with the entity's cephx key.

2. on a per-pool basis.

The first one would definitely be feasible, but potentially tricky to
implement just right, without too many exceptions or involvement of other
portions of the stack. E.g., dealing with metadata could become tricky. Then
again, there wasn't one reason that could not be addressed and become a
showstopper.

As for 2., it would definitely be the easiest to implement: pool is created with
an 'encrypted' flag on, key is kept in the monitors, OSDs encrypt any object
belonging to that pool. The problem with this option, however, is how
coarse-grained it is. If we really wanted a per-tenant approach, one would have
to ensure one pool per tenant. Not necessarily a big deal if a lot of
potentially small pools is fine. This idea was scrapped in favour of encrypting
namespaces instead.

Given RADOS already has the concept of a namespace, it might just be the ideal
medium to implement such an approach, as we get the best of the two options
above: we can get a smaller-grained access than a pool, but still with the same
capabilities of limiting access by entity through caps. We also get to have
multiple namespaces in a single pool should we choose to do so. All the 
while, the concept is high-level enough that the effort of implementing the 
actual encryption scheme might be performed in a select, handful of places,
without the need for a lot (maybe, any) particular exceptions or corner cases.

APPROACH
---------

It is important to note that there are several implementations details,
especially on "how exactly this is going to happen", that have not been fully
figured out.

Essentially, the objective is to ensure that objects from a given namespace are
always encrypted or decrypted by bluestore when writing or reading the data. The
hope that performing at this level will allow us to

1. ensure the operation is performed at the disk block size, ensuring that
   small writes, or partial writes, will not require a rewrite of the whole
   object; same goes for reads.

2. avoid dealing with all the mechanics involving objects and other operations
   over them, and focus solely on their data and metadata.

Secret distribution is expected to be done by the monitors, at the OSDs request.
In an ideal world, the OSDs would know exactly which namespaces they might have
to encrypt/decrypt, based on pools they currently hold, and request keys for
those before hand, such that they don't have to request a key from the monitor
when an operation arrives. This would not only require us to become a bit more
aware of namespaces, but keeping these keys cached might require the osd to
keep them encrypted in memory. What to use for that is something that hasn't
been much thought about -- maybe we could get away with using the osd's cephx
key.

As for the namespaces, in their current form we don't have much (any?)
information about them. Access to an object in a namespace is based on prior
knowledge of that namespace and the object's name. We currently don't have
statistics on namespaces, nor are we able to know whether an OSD keeps any
object belonging to a namespace _before_ an operation on such an object is
handled.

Even though it's not particularly _required_ to get more out of namespaces than
we currently have, it would definitely be ideal if we ended up with the ability
to 1) have statistics out of namespaces, as it would imperative if we're using
them for tenants; and 2) able to cache ahead keys for namespaces an osd might
have to handle (read, namespaces living in a pool with PGs mapped to a given
osd).

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx