On 12/14/22 10:09 AM, Stefan Kooman wrote:
On 11/21/22 10:07, Stefan Kooman wrote:
On 11/8/22 21:20, Mark Nelson wrote:
2.
https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/
<https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/>
You tested network encryption impact on performance. It would be nice
to see how OSD encryption (encryption at rest) impacts performance.
As far as I can see there is not much public information available on
this. However there is one thread with this exact question asked [1].
And it contains an interesting blog post from Cloudlare [2]. I
repeated the tests from [2] and could draw the same conclusions:
TL;DR: performance is increased a lot and less CPU is used. Some fio
4k write, iodepth=1, performance numbers on a Samsung PM983 3.84 TB
drive )Ubuntu 22.04 with HWE kernel, 5.15.0-52-generic, AMD EPYC
7302P 16-Core Processor, C-state pinning, CPU performance mode on,
Samsung PM 983 firmware: EDA5702Q):
Unencrypted NVMe:
write: IOPS=63.3k, BW=247MiB/s (259MB/s)(62.6GiB/259207msec); 0 zone
resets
clat (nsec): min=13190, max=56400, avg=15397.89, stdev=1506.45
lat (nsec): min=13250, max=56940, avg=15462.03, stdev=1507.88
Encrypted (without no_write_workqueue / no_read_workqueue):
write: IOPS=34.8k, BW=136MiB/s (143MB/s)(47.4GiB/357175msec); 0
zone resets
clat (usec): min=24, max=1221, avg=28.12, stdev= 2.98
lat (usec): min=24, max=1221, avg=28.37, stdev= 2.99
Encrypted (with no_write_workqueue / no_read_workqueue enabled):
write: IOPS=55.7k, BW=218MiB/s (228MB/s)(57.3GiB/269574msec); 0 zone
resets
clat (nsec): min=15710, max=87090, avg=17550.99, stdev=875.72
lat (nsec): min=15770, max=87150, avg=17614.82, stdev=876.85
So encryption does have a performance impact, but the added latency
compared to the latency Ceph itself adds to (client) IO seems
negligible. At least, when the work queues are bypassed, otherwise a
lot of CPU seems to be involved (loads of kcryptd threads). And that
might hurt max performance on a system that is CPU bound.
So, I have an update on this. One of our test clusters is now running
with encrypted drives without the read/write work queues. Compared to
the default (with work queues) it saves an enormous amount of CPU: no
more hundreds of kcryptd threads consuming all available CPU.
The diff for ceph-volume encryption.py (pacific 16.2.10 docker image,
sha256:2b68483bcd050472a18e73389c0e1f3f70d34bb7abf733f692e88c935ea0a6bd):
--- encryption.py 2022-12-07 08:32:50.949778767 +0100
+++ encryption_bypass.py 2022-12-07 08:32:25.493558910 +0100
@@ -71,6 +71,8 @@
'--key-file',
'-',
'--allow-discards', # allow discards (aka TRIM) requests
for device
+ '--perf-no_read_workqueue', # no read workqueue
+ '--perf-no_write_workqueue', # no write workqueue
'open',
device,
mapping,
@@ -98,6 +100,8 @@
'--key-file',
'-',
'--allow-discards', # allow discards (aka TRIM) requests
for device
+ '--perf-no_read_workqueue', # no read workqueue
+ '--perf-no_write_workqueue', # no write workqueue
'luksOpen',
device,
mapping,
The performance seems to be improved for single threaded IO with
iodepth=1. The random read performance with iodepth=32 is lower than
the default (at the cost of extra CPU).
However, that is not all there is to it. Newish cryptsetup will auto
determine what sector size to use for encryption.
To hard code it (for testing purposes) the following option can be
added to def luks_format(key, device): function
'--sector-size=4096', # force 4096 sector size for know. Should be
auto derived from physical_block_size
So, ideally this should be auto determined by ceph-volume. As a matter
of fact, the util/disk.py script does collect this information. But it
does not seem to be used here. Info on physical / logical block size
can be derived from:
/sys/block/device/queue/physical_block_size and
/sys/block/device/queue/logical_block_size
According to [1] performance is improved (on NVMe devices) by 2-3%.
According to this thread [2] you want to use 4K sector size and only
use "--perf-no_read_workqueue". I have no tested this combination yet.
Strange enough cryptsetup 2.4.3 choose to use 4096 sector size
although both physical_block_size / logical_block_size where both 512
bytes for SAMSUNG MZQLB3T8HALS-00007 disk.
I will reformat an NVMe into 4K native blocks and do a performance
comparison, both with and without encryption to see what comes out.
The cluster I'm testing on seem to give high variability in the tests.
So I'm going to set up a new cluster with NVMe only and repeat the
tests. It would be great if more people could give it a try and post
their results.
Gr. Stefan
[1]: https://fedoraproject.org/wiki/Changes/LUKSEncryptionSectorSize
[2]:
https://www.reddit.com/r/Fedora/comments/rzvhyg/default_luks_encryption_settings_on_fedora_can_be/
This is great work! Would you consider making a PR against main for the
change to ceph-volume? Given that you have performance data it sounds
like good justification. I'm not sure who's merging changes to
ceph-volume these days, but I can try to find out if no one is biting.
Mark
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx