Daily slow ops around the same time on different osds

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Tue, 18 Jun 2024 09:19:48 +0000

Hi,

For some reason daily I receive a slow ops which affect the rgw.log pool the most (which is pretty small, 32pg / 57k objects / 9.5GB data)

Could be the issue related to rocksdb tasks?

Some log lines:
...
2024-06-18T14:17:43.849+0700 7fd8002ea700  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7fd8002ea700' had timed out after 15
2024-06-18T14:17:50.865+0700 7fd81a31e700 -1 osd.461 472771 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.3349483990.0:462716190 22.11 22:8deed178:::meta.log.08e85e9f-c16e-43f0-b88d-362f3b7ced2d.15:head [call log.list in=69b] snapc 0=[] ondisk+read+known_if_redirected e472771)
2024-06-18T14:17:50.865+0700 7fd81a31e700  0 log_channel(cluster) log [WRN] : 1 slow requests (by type [ 'delayed' : 1 ] most affected pool [ 'dc.rgw.log' : 1 ])
2024-06-18T14:19:36.040+0700 7fd81e5b4700  1 heartbeat_map is_healthy 'OSD::osd_op_tp
...
2024-06-18T14:19:51.908+0700 7fd81a31e700 -1 osd.461 472771 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.3371539359.0:5095510 22.11 22:8df1b9e1:::datalog.sync-status.shard.61c9d940-fde4-4bed-9389-edc8d7741817.111:
head [call lock.lock in=64b] snapc 0=[] ondisk+write+known_if_redirected e472771)
2024-06-18T14:19:51.908+0700 7fd81a31e700  0 log_channel(cluster) log [WRN] : 2 slow requests (by type [ 'delayed' : 2 ] most affected pool [ 'dc.rgw.log' : 2 ])
2024-06-18T14:19:52.899+0700 7fd81a31e700 -1 osd.461 472771 get_health_metrics reporting 2 slow ops, oldest is osd_op(client.3371539359.0:5095510 22.11 22:8df1b9e1:::datalog.sync-status.shard.61c9d940-fde4-4bed-9389-edc8d7741817.111:

I found this thread: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/FR5V466HBXGRVL3Z3RAFKUPQ2FGK2T53/
but I think it is not relevant for me.

We use only SSDs without separated rocksdb so we don' have spillover.
CPU is not overloaded (80% idle) however it's interesting that in vmstat the "r" is pretty big which indicates waiting list for cpu:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
12  0      0 6205288 329803136 15633844    0    0  2149  1894    0    0  5  3 92  0  0
17  0      0 6156780 329827840 15637252    0    0 125584 250416 646098 1965051 10  8 82  0  0
21  0      0 6154636 329849504 15636112    0    0 99320 245024 493324 1682377  7  6 87  0  0
16  0      0 6144256 329869664 15636924    0    0 87484 301968 623345 1993374  8  7 84  0  0
19  0      0 6057012 329890080 15637932    0    0 160444 303664 549194 1820562  8  6 85  0  0

Any idea what this could indicate on octopus 15.2.17 with ubuntu 20.04?

ty

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx