Re: Recent ceph.io Performance Blog Posts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/14/22 10:09 AM, Stefan Kooman wrote:
On 11/21/22 10:07, Stefan Kooman wrote:
On 11/8/22 21:20, Mark Nelson wrote:

2.
    https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/
    <https://ceph.io/en/news/blog/2022/qemu-kvm-tuning/>

You tested network encryption impact on performance. It would be nice to see how OSD encryption (encryption at rest) impacts performance. As far as I can see there is not much public information available on this. However there is one thread with this exact question asked [1]. And it contains an interesting blog post from Cloudlare [2]. I repeated the tests from [2] and could draw the same conclusions: TL;DR: performance is increased a lot and less CPU is used. Some fio 4k write, iodepth=1, performance numbers on a Samsung PM983 3.84 TB drive )Ubuntu 22.04 with HWE kernel, 5.15.0-52-generic, AMD EPYC 7302P 16-Core Processor, C-state pinning, CPU performance mode on, Samsung PM 983 firmware: EDA5702Q):

Unencrypted NVMe:

write: IOPS=63.3k, BW=247MiB/s (259MB/s)(62.6GiB/259207msec); 0 zone resets
     clat (nsec): min=13190, max=56400, avg=15397.89, stdev=1506.45
      lat (nsec): min=13250, max=56940, avg=15462.03, stdev=1507.88


Encrypted (without no_write_workqueue / no_read_workqueue):

   write: IOPS=34.8k, BW=136MiB/s (143MB/s)(47.4GiB/357175msec); 0 zone resets
     clat (usec): min=24, max=1221, avg=28.12, stdev= 2.98
      lat (usec): min=24, max=1221, avg=28.37, stdev= 2.99


Encrypted (with no_write_workqueue / no_read_workqueue enabled):

write: IOPS=55.7k, BW=218MiB/s (228MB/s)(57.3GiB/269574msec); 0 zone resets
     clat (nsec): min=15710, max=87090, avg=17550.99, stdev=875.72
      lat (nsec): min=15770, max=87150, avg=17614.82, stdev=876.85

So encryption does have a performance impact, but the added latency compared to the latency Ceph itself adds to (client) IO seems negligible. At least, when the work queues are bypassed, otherwise a lot of CPU seems to be involved (loads of kcryptd threads). And that might hurt max performance on a system that is CPU bound.

So, I have an update on this. One of our test clusters is now running with encrypted drives without the read/write work queues. Compared to the default (with work queues) it saves an enormous amount of CPU: no more hundreds of kcryptd threads consuming all available CPU.

The diff for ceph-volume encryption.py (pacific 16.2.10 docker image, sha256:2b68483bcd050472a18e73389c0e1f3f70d34bb7abf733f692e88c935ea0a6bd):

--- encryption.py        2022-12-07 08:32:50.949778767 +0100
+++ encryption_bypass.py    2022-12-07 08:32:25.493558910 +0100
@@ -71,6 +71,8 @@
           '--key-file',
           '-',
           '--allow-discards',  # allow discards (aka TRIM) requests for device
+    '--perf-no_read_workqueue', # no read workqueue
+    '--perf-no_write_workqueue', # no write workqueue
           'open',
           device,
           mapping,
@@ -98,6 +100,8 @@
           '--key-file',
           '-',
           '--allow-discards',  # allow discards (aka TRIM) requests for device
+    '--perf-no_read_workqueue', # no read workqueue
+    '--perf-no_write_workqueue', # no write workqueue
           'luksOpen',
           device,
           mapping,

The performance seems to be improved for single threaded IO with iodepth=1. The random read performance with iodepth=32 is lower than the default (at the cost of extra CPU).

However, that is not all there is to it. Newish cryptsetup will auto determine what sector size to use for encryption.

To hard code it (for testing purposes) the following option can be added to def luks_format(key, device): function

'--sector-size=4096', # force 4096 sector size for know. Should be auto derived from physical_block_size

So, ideally this should be auto determined by ceph-volume. As a matter of fact, the util/disk.py script does collect this information. But it does not seem to be used here. Info on physical / logical block size can be derived from:

/sys/block/device/queue/physical_block_size and /sys/block/device/queue/logical_block_size

According to [1] performance is improved (on NVMe devices) by 2-3%. According to this thread [2] you want to use 4K sector size and only use "--perf-no_read_workqueue". I have no tested this combination yet.

Strange enough cryptsetup 2.4.3 choose to use 4096 sector size although both physical_block_size / logical_block_size where both 512 bytes for SAMSUNG MZQLB3T8HALS-00007 disk.

I will reformat an NVMe into 4K native blocks and do a performance comparison, both with and without encryption to see what comes out.

The cluster I'm testing on seem to give high variability in the tests. So I'm going to set up a new cluster with NVMe only and repeat the tests. It would be great if more people could give it a try and post their results.

Gr. Stefan

[1]: https://fedoraproject.org/wiki/Changes/LUKSEncryptionSectorSize
[2]: https://www.reddit.com/r/Fedora/comments/rzvhyg/default_luks_encryption_settings_on_fedora_can_be/


This is great work!  Would you consider making a PR against main for the change to ceph-volume?  Given that you have performance data it sounds like good justification.  I'm not sure who's merging changes to ceph-volume these days, but I can try to find out if no one is biting.


Mark

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux