In my experience, ceph will add around 1ms even if only on localhost. If
this is in the client code or on the OSD's, I dont really know. I don't
even know the precise reason, but the latency is there nevertheless.
Perhaps you can find the reason here among the tradeoffs ceph and
similar systems have to make to ensure consistency even if a partition
can happen at any time:
https://en.wikipedia.org/wiki/PACELC_theorem
With size=3, a write will go first to the primary OSD for the PG,
(0,1ms), then from there to two more PGs (in parallell), so 0,2ms more
total RTT. Then back to the client, 0,1ms. That is, very roughly, 1,4ms
even if storage latency is 0 which it never is even for ssds.
If you set size=1, you can skip the step where the primary OSD
replicates to the 2 replicas, but you still have cephs internal latency
as well as the network latency to reach the primary OSD for whatever PG
the object will belong to which could be on any server. So expect a
small improvement but not too much.
With that said, a single thread will not exceed 1000 iops ever in a
typical setup.
/Peter
Den 2024-11-26 kl. 21:09, skrev Martin Gerhard Loschwitz:
that would mean 2-3ms latency between hosts hanging above each other
in the same rack connected to the same switches.
Ping shows 0,2ms of latency though for all three affected clusters.
So roughly 5000 iops. We can certainly add Ceph latency to that, but
that would mean Ceph eats 99% of the available performance, wouldn't it?
Also, that wouldn't explain why we're seeing a bit of improvement with
size=1 for a specific pool but not a massive improvement, given that
at least half of the latency is taken out of the equation in that case.
Best regards
Martin
Peter Linder <peter.linder@xxxxxxxxxxxxxx> schrieb am Di. 26. Nov.
2024 um 20:52:
With qd=1 (queue depth?) and a single thread, this isn't totally
unreasonable.
Ceph will have an internal latency of around 1ms or so, add some
network
to that and an operation can take 2-3ms. With a single operation in
flight all the time, this means 333-500 operations per second. With
hdds, even fewer.
What happens if you try again with many more threads?
Den 2024-11-25 kl. 15:22, skrev Martin Gerhard Loschwitz:
> Folks,
>
> I am getting somewhat desperate debugging multiple setups here
within the same environment. Three clusters, two SSD-only, one
HDD-only, and what they all have in common is abysmal 4k IOPS
performance when measuring with „rados bench“. Abysmal means: In
an All-SSD cluster I will get roughly 400 IOPS over more than 250
devices. I’ve know SAS-SSDs are not ideal, but 250 looks a bit on
the low side of things to me.
>
> In the second cluster, also All-SSD based, I get roughly 120 4k
IOPS. And the HDD-only cluster delivers 60 4k IOPS. The latter
both with substantially fewer devices, granted. But even with 20
HDDs, 68 4k IOPS seems like a very bad value to me.
>
> I’ve tried to rule out everything I know of: BIOS
misconfigurations, HBA problems, networking trouble (I am seeing
comparably bad values with a size=1 pool) and so further and so
on. But to no avail. Has anybody dealt with something similar on
Dell hardware or in general? What could cause such extremely bad
benchmark results?
>
> I measure with rados bench and qd=1 at 4k block size. „ceph tell
osd bench“ with 4k blocks yields 30k+ IOPS for every single device
in the big cluster, and all that leads to is 400 IOPS in total
when writing to it? Even with no replication in place? That looks
a bit off, doesn't it? Any help will be greatly appreciated, thank
you very much in advance. Even a pointer to the right direction
would be held in high esteem right now. Thank you very much in
advance!
>
> Best regards
> Martin
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx