SSE-KMS vs SSE-S3 with per-object-data-keys

Stefan Schueffler <s.schueffler@xxxxxxxxxxxxx> · Thu, 24 Nov 2022 08:56:14 +0000

Hi,

i appreciate a lot the recently added SSE-S3 encryption in radosgw. As far as i know, this encryption works very similar to the „original“ design in Amazon S3:

- it uses a per-bucket master key (used solely to encrypt the data-keys), stored in rgw_crypt_sse_s3_vault_prefix.
- and it creates a per-object data-key, to encrypt the individual uploaded objects, stored encrypted in the objectmetada

In order to do this, ceph depends on hashicorp vault’s transit engine, which supports exactly this master-key/data-key scenario.

In contrast to this, the somewhat older implementation of SSE-KMS lacks this support of individual data-keys per object. It even lacks the support of an „undefined“ key-id - which is a totally fine use-case in Amazon-S3.

Now, since the new SSE-S3-implementation is done, i would like to ask if it would be possible to rewrite/enhance the SSE-KMS-implementation (at least when combined with vault’s transit engine) to behave like the SSE-S3-implementation (in terms of master-key/data-key, and in terms of generating it’s own per-bucket-master-key when no key-id is given).

This way, the implementation would be nearly identical to the design specification of Amazon S3, and it could be 100% backwards compatible without impact for existing setups and already stored data. As an implementation note, the „new“ implementation for KMS would simply need to use the same functionality / code as the SSE-S3 implementation - and extended to support both use-cases with a given key-id and an undefined one.

So, in pseudo-code, the kms-implementation could be like this:

- no key-id given:
currently, it throws an unsupported operation exception. In the future, it simply could do the same magic as with S3 (at least when combined with vault transit): get (or create a new one on the first request) the per-bucket-key (stored in rgw_crypt_vault_prefix - this is the difference to SSE-S3). Then, go on as if the key-id was given.

- key-id given in the request:
currently, it pulls the key by id from vault, and encrypts the data. In the future, it could create a new data-key based on the given key-id, and use this to encrypt the data (exactly as it is in case of SSE-S3). 
In case of not having vault’s transit engine (e.g. pv/pv2-engine, or other crypt backend not supporting data-keys), simply continue with the old behavior: pull the key and encrypt the data. 
In case of an already stored object: check the object metadata if there is an data-key stored alongside: then use the SSE-S3-iike workflow of decrypting the data-key and then decryption the object data. If there is no data-key alongside the object-metadata, then there should be the „old-workflow“ key-id stored. In this case, use the old workflow of pulling the key from vault, and use this to decrypt the data.

The changes would not be to complex, and the gains would be that ceph always uses a master-key/data-key (instead of just „the key“ given by key-id), and it would add the implementation of SSE-KMS without a given key-id (amazon calls this SSE-KMS with customer provided key (when the key-id is given) or SSE-KMS with amazon managed key (when there is no key-id given) - in both cases the user’s vault will be used to store/retrieve the master-keys, in contrast to amazons own internal vault in case of SSE-S3).

I would like to help here with ideas.

Best
Stefan

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx