About ceph disk slowops effect to cluster

Phong Tran Thanh <tranphong079@xxxxxxxxx> · Sat, 6 Jan 2024 23:48:53 +0700

Hi community
I'm currently facing a significant issue with my Ceph cluster. I have a
cluster consisting of 10 nodes, and each node is equipped with 6 SSDs of
960GB used for block.db and 18 12TB drives used for data, network bonding
2x10Gbps for public and local networks.

I am using a 4+2 erasure code for RBD in my Ceph cluster. When one node
becomes unavailable, the cluster initiates the recovery process, and
subsequently, slow operations (slowops) logs appear on the disk, impacting
the entire cluster. Afterward, additional nodes are marked as failures. Is
this phenomenon possibly due to the performance of SSDs and HDDs?

When I check the I/O of the disk using the iostat command, the result shows
that disk utilization has reached 80-90%
Is using a combination of HDDs and SAS SSDs in Ceph a choice leading to
poor performance?

My Ceph cluster has a bandwidth of 1.9GB/s
Thanks and hope someone can help me

----------------------------------------------------------------------------

Email: tranphong079@xxxxxxxxx
Skype: tranphong079
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx