Re: 4k IOPS: miserable performance in All-SSD cluster

Peter Linder <peter.linder@xxxxxxxxxxxxxx> · Tue, 26 Nov 2024 22:23:59 +0100

Not really. I'm assuming that they have been working hard at it and I 
remember hearing something about a more recent rocksdb version shaving 
off significant time. It would also depend on your CPU and memory speed.

I wouldn't be all surprised if latency is lower today, but I havent 
really measured it lately.

/Peter

Den 2024-11-26 kl. 22:03, skrev Marc:
In my experience, ceph will add around 1ms even if only on localhost. If
this is in the client code or on the OSD's, I dont really know. I don't
even know the precise reason, but the latency is there nevertheless.
Perhaps you can find the reason here among the tradeoffs ceph and
similar systems have to make to ensure consistency even if a partition
can happen at any time:

https://en.wikipedia.org/wiki/PACELC_theorem

With size=3, a write will go first to the primary OSD for the PG,
(0,1ms), then from there to two more PGs (in parallell), so 0,2ms more
total RTT. Then back to the client, 0,1ms. That is, very roughly, 1,4ms
even if storage latency is 0 which it never is even for ssds.

If you set size=1, you can skip the step where the primary OSD
replicates to the 2 replicas, but you still have cephs internal latency
as well as the network latency to reach the primary OSD for whatever PG
the object will belong to which could be on any server. So expect a
small improvement but not too much.

With that said, a single thread will not exceed 1000 iops ever in a
typical setup.

Do you have an idea how did this progress over the versions last few years? I thought they were addressing this type of performance issue. I can remember that when moving from direct disk access to using lvm there were also people complianing about added latency.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx