Re: 4k IOPS: miserable performance in All-SSD cluster

Marc <Marc@xxxxxxxxxxxxxxxxx> · Tue, 26 Nov 2024 21:03:53 +0000

> 
> In my experience, ceph will add around 1ms even if only on localhost. If
> this is in the client code or on the OSD's, I dont really know. I don't
> even know the precise reason, but the latency is there nevertheless.
> Perhaps you can find the reason here among the tradeoffs ceph and
> similar systems have to make to ensure consistency even if a partition
> can happen at any time:
> 
> https://en.wikipedia.org/wiki/PACELC_theorem
> 
> With size=3, a write will go first to the primary OSD for the PG,
> (0,1ms), then from there to two more PGs (in parallell), so 0,2ms more
> total RTT. Then back to the client, 0,1ms. That is, very roughly, 1,4ms
> even if storage latency is 0 which it never is even for ssds.
> 
> If you set size=1, you can skip the step where the primary OSD
> replicates to the 2 replicas, but you still have cephs internal latency
> as well as the network latency to reach the primary OSD for whatever PG
> the object will belong to which could be on any server. So expect a
> small improvement but not too much.
> 
> With that said, a single thread will not exceed 1000 iops ever in a
> typical setup.
> 

Do you have an idea how did this progress over the versions last few years? I thought they were addressing this type of performance issue. I can remember that when moving from direct disk access to using lvm there were also people complianing about added latency.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx