On 11/21/22 10:07, Stefan Kooman wrote:
On 11/8/22 21:20, Mark Nelson wrote:
2.
https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/
<https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/>
You tested network encryption impact on performance. It would be nice to
see how OSD encryption (encryption at rest) impacts performance. As far
as I can see there is not much public information available on this.
However there is one thread with this exact question asked [1]. And it
contains an interesting blog post from Cloudlare [2]. I repeated the
tests from [2] and could draw the same conclusions: TL;DR: performance
is increased a lot and less CPU is used. Some fio 4k write, iodepth=1,
performance numbers on a Samsung PM983 3.84 TB drive )Ubuntu 22.04 with
HWE kernel, 5.15.0-52-generic, AMD EPYC 7302P 16-Core Processor, C-state
pinning, CPU performance mode on, Samsung PM 983 firmware: EDA5702Q):
Unencrypted NVMe:
write: IOPS=63.3k, BW=247MiB/s (259MB/s)(62.6GiB/259207msec); 0 zone resets
clat (nsec): min=13190, max=56400, avg=15397.89, stdev=1506.45
lat (nsec): min=13250, max=56940, avg=15462.03, stdev=1507.88
Encrypted (without no_write_workqueue / no_read_workqueue):
write: IOPS=34.8k, BW=136MiB/s (143MB/s)(47.4GiB/357175msec); 0 zone
resets
clat (usec): min=24, max=1221, avg=28.12, stdev= 2.98
lat (usec): min=24, max=1221, avg=28.37, stdev= 2.99
Encrypted (with no_write_workqueue / no_read_workqueue enabled):
write: IOPS=55.7k, BW=218MiB/s (228MB/s)(57.3GiB/269574msec); 0 zone resets
clat (nsec): min=15710, max=87090, avg=17550.99, stdev=875.72
lat (nsec): min=15770, max=87150, avg=17614.82, stdev=876.85
So encryption does have a performance impact, but the added latency
compared to the latency Ceph itself adds to (client) IO seems
negligible. At least, when the work queues are bypassed, otherwise a lot
of CPU seems to be involved (loads of kcryptd threads). And that might
hurt max performance on a system that is CPU bound.
So, I have an update on this. One of our test clusters is now running
with encrypted drives without the read/write work queues. Compared to
the default (with work queues) it saves an enormous amount of CPU: no
more hundreds of kcryptd threads consuming all available CPU.
The diff for ceph-volume encryption.py (pacific 16.2.10 docker image,
sha256:2b68483bcd050472a18e73389c0e1f3f70d34bb7abf733f692e88c935ea0a6bd):
--- encryption.py 2022-12-07 08:32:50.949778767 +0100
+++ encryption_bypass.py 2022-12-07 08:32:25.493558910 +0100
@@ -71,6 +71,8 @@
'--key-file',
'-',
'--allow-discards', # allow discards (aka TRIM) requests
for device
+ '--perf-no_read_workqueue', # no read workqueue
+ '--perf-no_write_workqueue', # no write workqueue
'open',
device,
mapping,
@@ -98,6 +100,8 @@
'--key-file',
'-',
'--allow-discards', # allow discards (aka TRIM) requests
for device
+ '--perf-no_read_workqueue', # no read workqueue
+ '--perf-no_write_workqueue', # no write workqueue
'luksOpen',
device,
mapping,
The performance seems to be improved for single threaded IO with
iodepth=1. The random read performance with iodepth=32 is lower than the
default (at the cost of extra CPU).
However, that is not all there is to it. Newish cryptsetup will auto
determine what sector size to use for encryption.
To hard code it (for testing purposes) the following option can be added
to def luks_format(key, device): function
'--sector-size=4096', # force 4096 sector size for know. Should be auto
derived from physical_block_size
So, ideally this should be auto determined by ceph-volume. As a matter
of fact, the util/disk.py script does collect this information. But it
does not seem to be used here. Info on physical / logical block size can
be derived from:
/sys/block/device/queue/physical_block_size and
/sys/block/device/queue/logical_block_size
According to [1] performance is improved (on NVMe devices) by 2-3%.
According to this thread [2] you want to use 4K sector size and only use
"--perf-no_read_workqueue". I have no tested this combination yet.
Strange enough cryptsetup 2.4.3 choose to use 4096 sector size although
both physical_block_size / logical_block_size where both 512 bytes for
SAMSUNG MZQLB3T8HALS-00007 disk.
I will reformat an NVMe into 4K native blocks and do a performance
comparison, both with and without encryption to see what comes out.
The cluster I'm testing on seem to give high variability in the tests.
So I'm going to set up a new cluster with NVMe only and repeat the
tests. It would be great if more people could give it a try and post
their results.
Gr. Stefan
[1]: https://fedoraproject.org/wiki/Changes/LUKSEncryptionSectorSize
[2]:
https://www.reddit.com/r/Fedora/comments/rzvhyg/default_luks_encryption_settings_on_fedora_can_be/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx