Re: Optimizations on "high" latency Ceph clusters

"Tarrago, Eli (RIS-BCT)" <Eli.Tarrago@xxxxxxxxxxxxxxxxxx> · Wed, 2 Oct 2024 14:36:55 +0000

Ceph performs poorly with mildly “high” latency. Ceph performs best inside the same dc, and best when each server is layer 2 adjacent.

Just to let you know, spanning multiple data centers is not advised.

Instead, use RBD Mirroring for RBD replication, or RGW multisite for cross-DC S3 workloads.

From: Victor Rodriguez <vrodriguez@xxxxxxxxxxxxx>
Date: Wednesday, October 2, 2024 at 10:21 AM
To: Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxx>
Subject:  Re: Optimizations on "high" latency Ceph clusters
*** External email: use caution ***

> Hi,
>
> What makes this cluster a non-local cluster?

It's hosted in OVH's 3AZ, with each host in a different DC, each at
around 30-60km's away from each other, hence the relatively high latency.

> 0'6 and 1 millisecond RTT latency seems too high for all-flash clusters and intense 4K write workloads.

I'm fully aware of the limitations imposed by the latency. I was
wondering if there is something that could be done to improve
performance under this conditions. Measured performance is more than
enough for the workloads that the cluster will host, as 4k QD=1 sync
writes/reads are not the main I/O pattern.

> The upmap-read or read balancer modes may help with reads but not writes where 1.2ms+ latency will still be observed.

AFAIK upmap-read isn't available in Reef, at least does not show up in
the docs [1].

Thanks!

[1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.ceph.com%2Fen%2Freef%2Frados%2Foperations%2Fbalancer%2F&data=05%7C02%7Celi.tarrago%40lexisnexisrisk.com%7Cd3c4828c3f47445676a208dce2ed8f81%7C9274ee3f94254109a27f9fb15c10675d%7C0%7C0%7C638634757180374101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=uNZR2VeJaSFtvzFt1euf3FXwD%2BrfWnVXknaZhWngkeA%3D&reserved=0<https://docs.ceph.com/en/reef/rados/operations/balancer/>

> Regards,
> Frédéric.
>
> ----- Le 1 Oct 24, à 18:23, Victor Rodriguezvrodriguez@xxxxxxxxxxxxx  a écrit :
>
>> Hello,
>>
>> I'm trying to get the most from a Ceph Reef 3 node clusters, with 6 NVMe
>> OSD each. Each node is between 0'6 and 1 millisecond RTT. Obviously
>> performance isn't as good as with local clusters, usually around ~0'2ms.
>> 4k in Q1D1 I/O write request are the most affected by this, as expected,
>> with a performance loss of around 70% from an all local cluster with
>> similar hardware. In general, measurements match the increased latency.
>>
>> Are there any guidelines about what to tune or adjust to get the most of
>> this kind of setups with "high" latencies?
>>
>> Thanks in advance!
>>
>> Victor
>>
>> _______________________________________________
>> ceph-users mailing list --ceph-users@xxxxxxx
>> To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

________________________________
The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. This message may be an attorney-client communication and/or work product and as such is privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx