Re: 4k IOPS: miserable performance in All-SSD cluster

pg@xxxxxxxxxxxxxxxxxxxx (Peter Grandi) · Tue, 26 Nov 2024 13:09:13 +0000

> [...] All-SSD cluster I will get roughly 400 IOPS over more
> than 250 devices. I’ve know SAS-SSDs are not ideal, but 250
> looks a bit on the low side of things to me. In the second
> cluster, also All-SSD based, I get roughly 120 4k IOPS. And
> the HDD-only cluster delivers 60 4k IOPS.

Regardless of the specifics: 4KiB write IOPS is definitely not
what Ceph was designed for. Yet so many people know better and
use Ceph for VM disk images, even with logs and databases on
them.

> [...] „ceph tell osd bench“ with 4k blocks yields 30k+ IOPS
> for every single device in the big cluster, and all that leads
> to is 400 IOPS in total when writing to it? Even with no
> replication in place? [..]

Checks to do:

* If those are SAS SSDs they must have persistent on-device
  caches ("power loss protections) so ensure that they have
  synchronous writes disabled.

* What is the definition of the metadata pool and of the data pool?

* Are you actually measuring the rate of _metadata_ (object
  creation and deletion) or of _data_ operations?

* Do the MON logs report "slow ops"?

* Run 'iotop' and 'top' on MON and one OSD while running the
  benchmark.

* You mentioned 'iostat': run 'iostat -dk -zxy /dev/sd* 1' on an
  OSD during the benchmark too.

* Run something like 'nuttcp'/'iperf' between the one MON and
  one OSD and between one OSD and one client.

* Run 10 and 100 parallel "ceph bench" with 4KiB blocks.

* Run 1 and 10 "ceph bench" with 64KiB and 1MiB blocks.

The last two because Ceph does not promise big single-thread
speed, but does much better over *many* threads.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx