Re: Optimizations on "high" latency Ceph clusters

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Wed, 2 Oct 2024 19:26:46 +0200 (CEST)

----- Le 2 Oct 24, à 16:21, Victor Rodriguez <vrodriguez@xxxxxxxxxxxxx> a écrit : 

>> Hi,

>> What makes this cluster a non-local cluster?

> It's hosted in OVH's 3AZ, with each host in a different DC, each at around
> 30-60km's away from each other, hence the relatively high latency.
Yeah. 0.6 to 1ms are expect latencies for that long distances. 

>> 0'6 and 1 millisecond RTT latency seems too high for all-flash clusters and
>> intense 4K write workloads.

> I'm fully aware of the limitations imposed by the latency. I was wondering if
> there is something that could be done to improve performance under this
> conditions.
Beyond the limits of physics, no. 

> Measured performance is more than enough for the workloads that the cluster will
> host, as 4k QD=1 sync writes/reads are not the main I/O pattern.
So in the end, the 4K IO benchmark isn't that important. The ability to use this cluster will depend solely on the workload, then. 

>> The upmap-read or read balancer modes may help with reads but not writes where
>> 1.2ms+ latency will still be observed.

> AFAIK upmap-read isn't available in Reef, at least does not show up in the docs
> [1].

> Thanks!
You're right. I forgot you're using Reef. Still you could try setting pg-upmap-primary manually, if I'm not mistaken. 

Regards, 
Frédéric. 

> [1] [ https://docs.ceph.com/en/reef/rados/operations/balancer/ |
> https://docs.ceph.com/en/reef/rados/operations/balancer/ ]

>> Regards,
>> Frédéric.

>> ----- Le 1 Oct 24, à 18:23, Victor Rodriguez [ mailto:vrodriguez@xxxxxxxxxxxxx |
>> vrodriguez@xxxxxxxxxxxxx ] a écrit :

>>> Hello,

>>> I'm trying to get the most from a Ceph Reef 3 node clusters, with 6 NVMe
>>> OSD each. Each node is between 0'6 and 1 millisecond RTT. Obviously
>>> performance isn't as good as with local clusters, usually around ~0'2ms.
>>> 4k in Q1D1 I/O write request are the most affected by this, as expected,
>>> with a performance loss of around 70% from an all local cluster with
>>> similar hardware. In general, measurements match the increased latency.

>>> Are there any guidelines about what to tune or adjust to get the most of
>>> this kind of setups with "high" latencies?

>>> Thanks in advance!

>>> Victor

>>> _______________________________________________
>>> ceph-users mailing list -- [ mailto:ceph-users@xxxxxxx | ceph-users@xxxxxxx ] To
>>> unsubscribe send an email to [ mailto:ceph-users-leave@xxxxxxx |
>>> ceph-users-leave@xxxxxxx ]
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx