Slow ops during index pool recovery causes cluster performance drop to 1%

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Sat, 2 Nov 2024 05:45:24 +0000

Hi,

I'm updating from octopus to quincy and all in our cluster when index pool recovery kicks off, cluster operation drops to 1%, slow ops comes non-stop.
The recovery takes 1-2 hours/nodes.

What I can see the iowait on the nvme drives which belongs to the index pool is pretty high, however the throughput is less than 500MB/s, the iops is less than 5000/sec.

The index pool is a 3:2 replica pool with 2048pg on 156 osd (1 nvme drive has 4 osds due to we experienced latency issue with 1 or 2 osd/nvme).

If we consider let's say the nvme drive still slow with these really small load, how would that be possible to somehow ease and get rid of this cluster performance drop?
If I increase replica to 4-5 would that help? It could tolerate more pg slowness maybe?

FYI we have many objects in our cluster, more than 4Billions: objects: 4.06G objects, 616 TiB

However I think it should still tolerate cluster recovery without penalty.

What I can see in the slow osd log with default debug value is about "get_health_metrics" so far :

2024-11-02T12:38:40.762+0700 7f241bc25640  0 log_channel(cluster) log [WRN] : 6 slow requests (by type [ 'delayed' : 6 ] most affected pool [ 'hkg.rgw.buckets.index' : 6 ])
2024-11-02T12:38:41.802+0700 7f241bc25640 -1 osd.110 626281 get_health_metrics reporting 7 slow ops, oldest is osd_op(client.3641786447.0:2194661324 26.588 26:11aa561a:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.503457179.1.10:head [call rgw.bucket_list in=47b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e626262)
2024-11-02T12:38:41.802+0700 7f241bc25640  0 log_channel(cluster) log [WRN] : 7 slow requests (by type [ 'delayed' : 7 ] most affected pool [ 'hkg.rgw.buckets.index' : 7 ])
2024-11-02T12:38:42.782+0700 7f241bc25640 -1 osd.110 626282 get_health_metrics reporting 7 slow ops, oldest is osd_op(client.3641786447.0:2194661324 26.588 26:11aa561a:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.503457179.1.10:head [call rgw.bucket_list in=47b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e626262)
2024-11-02T12:38:42.782+0700 7f241bc25640  0 log_channel(cluster) log [WRN] : 7 slow requests (by type [ 'delayed' : 7 ] most affected pool [ 'hkg.rgw.buckets.index' : 7 ])
2024-11-02T12:38:43.802+0700 7f241bc25640 -1 osd.110 626282 get_health_metrics reporting 7 slow ops, oldest is osd_op(client.3641786447.0:2194661324 26.588 26:11aa561a:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.503457179.1.10:head [call rgw.bucket_list in=47b] snapc 0=[] ondisk+read+known_if_redirected+supports_pool_eio e626262)
2024-11-02T12:38:43.802+0700 7f241bc25640  0 log_channel(cluster) log [WRN] : 7 slow requests (by type [ 'delayed' : 7 ] most affected pool [ 'hkg.rgw.buckets.index' : 7 ])

How we also try to make it smoother, after update and machine reboot compaction kicks off which generates 30-40 iowait on the node, we prevent with "noup" flag to put these osds into the cluster until compaction finished, however when we have 0 iowait after compaction, I unset noup so recovery can start which causes the above issue. If I wouldn't set noup it would cause even bigger issue.

Thank you for help

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx