Re: 4k IOPS: miserable performance in All-SSD cluster

Maged Mokhtar <mmokhtar@xxxxxxxxxxx> · Tue, 26 Nov 2024 23:09:59 +0200

can you check if you have any power saving settings, make sure cpu is 
set to max performance, use cpupower tool to check and disable all 
c-states, and run at max frequency.

for hdd qd=1, 60 iops is ok

for ssd qd=1, you should get roughly 3-5k iops read, 1k iops write, but 
if your cpu is powersaving, 400 iops write is possible.

Note with qd=1, it does not matter if you have 250 OSDs or 3 OSDs, your 
iops will be the same, as you will be activating only 3 OSDs at any time 
during your test. The more you add threads/qd, the more your iops will 
increase, linearly at first then will saturate. They will start to 
saturate roughly 8-16x the count of OSDs, so the more OSDs you have the 
more client threads can be served, each will see a max of 1k write ops 
but the total iops of all threads/queue depth will scale out.  With 250 
OSDs you should be able to handle a qd of 1k-2k, ofcourse this is a 
simplification as there are other resources involved like cpu that can 
become a bottleneck, you will get better performance if your 250 OSDs 
are in 50 hosts than if they are in 10, as you will have more cpu power.

On 25/11/2024 16:22, Martin Gerhard Loschwitz wrote:
Folks,

I am getting somewhat desperate debugging multiple setups here within the same environment. Three clusters, two SSD-only, one HDD-only, and what they all have in common is abysmal 4k IOPS performance when measuring with „rados bench“. Abysmal means: In an All-SSD cluster I will get roughly 400 IOPS over more than 250 devices. I’ve know SAS-SSDs are not ideal, but 250 looks a bit on the low side of things to me.

In the second cluster, also All-SSD based, I get roughly 120 4k IOPS. And the HDD-only cluster delivers 60 4k IOPS. The latter both with substantially fewer devices, granted. But even with 20 HDDs, 68 4k IOPS seems like a very bad value to me.

I’ve tried to rule out everything I know of: BIOS misconfigurations, HBA problems, networking trouble (I am seeing comparably bad values with a size=1 pool) and so further and so on. But to no avail. Has anybody dealt with something similar on Dell hardware or in general? What could cause such extremely bad benchmark results?

I measure with rados bench and qd=1 at 4k block size. „ceph tell osd bench“ with 4k blocks yields 30k+ IOPS for every single device in the big cluster, and all that leads to is 400 IOPS in total when writing to it? Even with no replication in place? That looks a bit off, doesn't it? Any help will be greatly appreciated, thank you very much in advance. Even a pointer to the right direction would be held in high esteem right now. Thank you very much in advance!

Best regards
Martin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx