On 5/26/23 23:09, Alexander E. Patrakov wrote:
Hello Frank,
On Fri, May 26, 2023 at 6:27 PM Frank Schilder <frans@xxxxxx> wrote:
Hi all,
jumping on this thread as we have requests for which per-client fs mount encryption makes a lot of sense:
What kind of security to you want to achieve with encryption keys stored
on the server side?
One of the use cases is if a user requests a share with encryption at rest. Since encryption has an unavoidable performance impact, it is impractical to make 100% of users pay for the requirements that only 1% of users really have. Instead of all-OSD back-end encryption hitting everyone for little reason, encrypting only some user-buckets/fs-shares on the front-end application level will ensure that the data is encrypted at rest.
I would disagree about the unavoidable performance impact of at-rest
encryption of OSDs. Read the CloudFlare blog article which shows how
they make the encryption impact on their (non-Ceph) drives negligible:
https://blog.cloudflare.com/speeding-up-linux-disk-encryption/. The
main part of their improvements (the ability to disable dm-crypt
workqueues) is already in the mainline kernel. There is also a Ceph
pull request that disables dm-crypt workqueues on certain drives:
https://github.com/ceph/ceph/pull/49554
Indeed. With the bypass workqueue option enabled for flash devices the
overhead of crypto is really low. Here a partial repost from an email I
send earlier
I repeated the tests from Cloudflare and could draw the same
conclusions: TL;DR: performance is increased a lot and less CPU is used.
Some fio 4k write, iodepth=1, performance numbers on a Samsung PM983
3.84 TB drive )Ubuntu 22.04 with HWE kernel, 5.15.0-52-generic, AMD EPYC
7302P 16-Core Processor, C-state pinning, CPU performance mode on,
Samsung PM 983 firmware: EDA5702Q):
Unencrypted NVMe:
write: IOPS=63.3k, BW=247MiB/s (259MB/s)(62.6GiB/259207msec); 0 zone resets
clat (nsec): min=13190, max=56400, avg=15397.89, stdev=1506.45
lat (nsec): min=13250, max=56940, avg=15462.03, stdev=1507.88
Encrypted (without no_write_workqueue / no_read_workqueue):
write: IOPS=34.8k, BW=136MiB/s (143MB/s)(47.4GiB/357175msec); 0 zone
resets
clat (usec): min=24, max=1221, avg=28.12, stdev= 2.98
lat (usec): min=24, max=1221, avg=28.37, stdev= 2.99
Encrypted (with no_write_workqueue / no_read_workqueue enabled):
write: IOPS=55.7k, BW=218MiB/s (228MB/s)(57.3GiB/269574msec); 0 zone resets
clat (nsec): min=15710, max=87090, avg=17550.99, stdev=875.72
lat (nsec): min=15770, max=87150, avg=17614.82, stdev=876.85
So encryption does have a performance impact, but the added latency is
only a few micro seconds. And these tests are on NVMe drives, not Ceph
OSDS. Compared to the latency Ceph itself adds to (client) IO this seems
negligible. At least, when the work queues are bypassed, otherwise a lot
of CPU seems to be involved (loads of kcryptd threads). And that might
hurt max performance on a system (especially if CPU bound).
So, today I did a comparison on a production cluster while draining an
OSD with 10 concurrent backfills:
without no_write_workqueue / no_read_workqueue: 32 krcryptd threads each
doing on average 5.5% CPU, and dmcrypt write thread doing ~ 9% CPU. So
that's almost 2 CPU cores
with no_write_workqueue / no_read_workqueue: dmcrypt / cryptd threads do
not even show up in top ...
So if encryption is important, even for a subset of your users, I'm
pretty sure you can enable it and won't have a negative impact.
It does require reprovisioning of all your OSDs ... which is not a small
feat. Although this thread started with "per user" encryption. If your
users do not trust your Ceph cluster ... client side encryption (i.e.
CephFS fscrypt) with a key _they_ manage is still the only way to go.
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx