Re: CEPH Cluster performance review

pg@xxxxxxxxxxxxxxxxxxxx (Peter Grandi) · Sun, 12 Nov 2023 10:05:22 +0000

> during scrubbing, OSD latency spikes to 300-600 ms,

I have seen Ceph clusters spike to several seconds per IO
operation as they were designed for the same goals.

> resulting in sluggish performance for all VMs. Additionally,
> some OSDs fail during the scrubbing process.

Most likely they time out because of IO congestion rather than
failing.

> In such instances, promptly halting the scrubbing resolves the
> issue.

> (6 SSD node + 6 HDD node) All nodes are connected through 10G
> bonded link, i.e. 10Gx2=20GB for each node. 64 SSD 42 HDD 106
> one-ssd 256 active+clean one-hdd 512 active+clean
> cloudstack.hdd 512 active+clean

Your Ceph cluster has been optimized for high latency and IO
congestion, goals that are suprisingly quite common, and is
performing well given its design parameters (it is far from
full, if it becomes fuller it will achieve its goals even
better).

https://www.sabi.co.uk/blog/15-one.html?150305#150305
"How many VMs per disk arm?"

https://www.sabi.co.uk/blog/15-one.html?150329#150329
"CERN's old large disk discussion and IOPS-per-TB"
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx