Re: Proposal: Encrypted Namespaces

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 04 Aug 2020 13:16:26 -0400

On Tue, 2020-08-04 at 09:04 -0400, Jason Dillaman wrote:
> On Mon, Aug 3, 2020 at 4:48 PM Joao Eduardo Luis <joao@xxxxxxx> wrote:
> > 
> > This proposal might be a bit thin on details, but I would love to have some
> > feedback and gauge the broader community's and developer's interest, as well as
> > to poke holes in the current idea.
> > 
> > All comments welcome.
> > 
> >   -Joao
> > 
> > 
> > 
> > MOTIVATION
> > ----------
> > 
> > Even though we currently have at-rest encryption, ensuring data security on the
> > physical device, this is currently on an OSD-basis, and it is too coarse-grained
> > to allow different entities/clients/tenants to have their data encrypted with
> > different keys.
> > 
> > The intent here is to allow different tenants to have their data encrypted at
> > rest, independently, and without necessarily relying on full osd encryption.
> > This way one could have anywhere between a handful to dozens or hundreds of
> > tenants with their data encrypted on disk, while not having to maintain full
> > at-rest encryption should the administrator consider it too cumbersome or
> > unnecessary.
> 
> I would be interested to hear the tenant use-case where they trust the
> backing storage system (Ceph) with all things encryption and don't
> have any effective control over the keys / ciphers / rotation policies
> / etc. If you have a vulnerability that exposes the current OSD
> dm-crypt keys, I would think it would be possible to get the
> per-namespace keys though a similar vector if they are stored
> effectively side-by-side?
> 

Agreed. If I were a cloud tenant, I'd not be thrilled at a scheme that
required me to trust the OSD to encrypt my cleartext data for me.

You might want to take a step back and consider that there are really
two problems you have to deal with:

1/ encryption: where and how do I perform the encryption of the data?

I think doing this as close to the edges as possible would be best, as
it means fewer trusted parties. Conceptually, that makes this part
fairly straightforward. Just run the crypto over the appropriate buffers
before you send and just after you receive. You will need to do things
like ensure that someone with a bad key can't corrupt data in an
encrypted namespace.

2/ key management: how do I get keys to feed into the crypto engine?

This is the hard part, IMO. While their scheme isn't perfect, you may
want to look closely at how Linux fscrypt works. Basically, each fs has
a master key and then you derive keys for the individual inodes from
that. Every inode has a nonce generated, that's used to ensure that
(e.g.) two identical files don't have identical encrypted contents.

The master keys are stored encrypted themselves, and the filesystem has
a number of ways that you can set up to unlock them -- passwords, hard
tokens, etc. Note that this part requires special tools to set up the
keys.

You'll probably want to aim for some sort of similar hierarchy of keys
here, I think. You may need a "special" per-object xattr or something to
store nonces, and you'll need to think about where they get generated.

One question : are you planning to encrypt object names too? Ideally,
you want to allow any client-generated data to be encrypted. In a
filesystem, that's basically filenames and contents. An object store is
probably pretty similar in that regard.

> > While there are very good arguments for ensuring this encryption is performed
> > client-side, such that each client actively controls their own secrets, a
> > server-side approach has several other benefits that may outweigh a client-side
> > approach.
> > 
> > On the one hand,
> > 
> > * encrypting server side means encrypting N times, depending on replication
> >   size and scheme;
> > * the secrets keyring will be centralized, likely in the monitor, much like
> >   what we do for dmcrypt; even though encrypted.
> > * on-the-wire data will still need to rely on msgr2 encryption; even though
> >   one could argue that this will likely happen regardless of whether a client-
> >   or server-side approach is being used.
> > 
> > But on the other,
> > 
> > 1. encryption becomes transparent for the client, avoiding the effort of
> >    implementing such schemes in client libraries and kernel drivers;
> 
> Just an FYI: krbd supports client-side via dm-crypt, kernel CephFS is
> actively looking to incorporate fscrypt, librbd can utilize
> QEMU-layered LUKS for many use-cases and work is in-progress on
> built-in librbd client-side encryption. RGW has had client-side
> encryption for a while.
> 
> > 2. tighter control over the unit of data being encrypted, reducing the load of
> >    encrypting a whole object versus a disk block in bluestore.
> 
> RBD client-side encryption doesn't rely on the underlying object size
> (512 bytes for dm-crypt I think and looking at 4KiB blocks for the
> librbd built-in encryption). I can't speak for CephFS+fscrypt, but I
> suspect it wouldn't require re-encrypting the full file or backing
> object (probably 4KiB page).
> 

For fscrypt, the client basically just encrypts/decrypts file data on a
per-block basis. For ceph, that means we'll just operate on it a page at
a time.

fwiw, the data path looks reasonably simple to deal with. The hard part
is dealing with encrypted filenames.

> > 3. older clients will be able to support encryption out of the box, given they
> >    will have no idea their data is being encrypted, nor on how that is happening.
> > 
> > 
> > CHOOSING NAMESPACES
> > --------------------
> > 
> > While investigating where and how per-tenant encryption could be implemented,
> > two other ideas were on the table:
> > 
> > 1. on a per-client basis, relying on cephx entities, with an encryption key
> >    per-client, or a shared key amongst several clients; this key would be kept
> >    encrypted in the monitor's kv store with the entity's cephx key.
> > 
> > 2. on a per-pool basis.
> > 
> > The first one would definitely be feasible, but potentially tricky to
> > implement just right, without too many exceptions or involvement of other
> > portions of the stack. E.g., dealing with metadata could become tricky. Then
> > again, there wasn't one reason that could not be addressed and become a
> > showstopper.
> > 
> > As for 2., it would definitely be the easiest to implement: pool is created with
> > an 'encrypted' flag on, key is kept in the monitors, OSDs encrypt any object
> > belonging to that pool. The problem with this option, however, is how
> > coarse-grained it is. If we really wanted a per-tenant approach, one would have
> > to ensure one pool per tenant. Not necessarily a big deal if a lot of
> > potentially small pools is fine. This idea was scrapped in favour of encrypting
> > namespaces instead.
> > 
> > Given RADOS already has the concept of a namespace, it might just be the ideal
> > medium to implement such an approach, as we get the best of the two options
> > above: we can get a smaller-grained access than a pool, but still with the same
> > capabilities of limiting access by entity through caps. We also get to have
> > multiple namespaces in a single pool should we choose to do so. All the
> > while, the concept is high-level enough that the effort of implementing the
> > actual encryption scheme might be performed in a select, handful of places,
> > without the need for a lot (maybe, any) particular exceptions or corner cases.
> > 
> > 
> > APPROACH
> > ---------
> > 
> > It is important to note that there are several implementations details,
> > especially on "how exactly this is going to happen", that have not been fully
> > figured out.
> > 
> > Essentially, the objective is to ensure that objects from a given namespace are
> > always encrypted or decrypted by bluestore when writing or reading the data. The
> > hope that performing at this level will allow us to
> > 
> > 1. ensure the operation is performed at the disk block size, ensuring that
> >    small writes, or partial writes, will not require a rewrite of the whole
> >    object; same goes for reads.
> > 
> > 2. avoid dealing with all the mechanics involving objects and other operations
> >    over them, and focus solely on their data and metadata.
> > 
> > Secret distribution is expected to be done by the monitors, at the OSDs request.
> > In an ideal world, the OSDs would know exactly which namespaces they might have
> > to encrypt/decrypt, based on pools they currently hold, and request keys for
> > those before hand, such that they don't have to request a key from the monitor
> > when an operation arrives. This would not only require us to become a bit more
> > aware of namespaces, but keeping these keys cached might require the osd to
> > keep them encrypted in memory. What to use for that is something that hasn't
> > been much thought about -- maybe we could get away with using the osd's cephx
> > key.
> > 
> > As for the namespaces, in their current form we don't have much (any?)
> > information about them. Access to an object in a namespace is based on prior
> > knowledge of that namespace and the object's name. We currently don't have
> > statistics on namespaces, nor are we able to know whether an OSD keeps any
> > object belonging to a namespace _before_ an operation on such an object is
> > handled.
> > 
> > Even though it's not particularly _required_ to get more out of namespaces than
> > we currently have, it would definitely be ideal if we ended up with the ability
> > to 1) have statistics out of namespaces, as it would imperative if we're using
> > them for tenants; and 2) able to cache ahead keys for namespaces an osd might
> > have to handle (read, namespaces living in a pool with PGs mapped to a given
> > osd).
> > 
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> > 
> 
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx