Hi,
around the same time
sounds like deep-scrubbing. Did you verify if those OSDs from the
mentioned pool were scrubbed around that time? The primary OSDs for
the PGs would log that. Is that pool that heavily used? Or could there
be one failing disk?
Zitat von "Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx>:
Hi,
For some reason daily I receive a slow ops which affect the rgw.log
pool the most (which is pretty small, 32pg / 57k objects / 9.5GB data)
Could be the issue related to rocksdb tasks?
Some log lines:
...
2024-06-18T14:17:43.849+0700 7fd8002ea700 1 heartbeat_map
reset_timeout 'OSD::osd_op_tp thread 0x7fd8002ea700' had timed out
after 15
2024-06-18T14:17:50.865+0700 7fd81a31e700 -1 osd.461 472771
get_health_metrics reporting 1 slow ops, oldest is
osd_op(client.3349483990.0:462716190 22.11
22:8deed178:::meta.log.08e85e9f-c16e-43f0-b88d-362f3b7ced2d.15:head
[call log.list in=69b] snapc 0=[] ondisk+read+known_if_redirected
e472771)
2024-06-18T14:17:50.865+0700 7fd81a31e700 0 log_channel(cluster)
log [WRN] : 1 slow requests (by type [ 'delayed' : 1 ] most affected
pool [ 'dc.rgw.log' : 1 ])
2024-06-18T14:19:36.040+0700 7fd81e5b4700 1 heartbeat_map
is_healthy 'OSD::osd_op_tp
...
2024-06-18T14:19:51.908+0700 7fd81a31e700 -1 osd.461 472771
get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.3371539359.0:5095510 22.11
22:8df1b9e1:::datalog.sync-status.shard.61c9d940-fde4-4bed-9389-edc8d7741817.111:
head [call lock.lock in=64b] snapc 0=[]
ondisk+write+known_if_redirected e472771)
2024-06-18T14:19:51.908+0700 7fd81a31e700 0 log_channel(cluster)
log [WRN] : 2 slow requests (by type [ 'delayed' : 2 ] most affected
pool [ 'dc.rgw.log' : 2 ])
2024-06-18T14:19:52.899+0700 7fd81a31e700 -1 osd.461 472771
get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.3371539359.0:5095510 22.11
22:8df1b9e1:::datalog.sync-status.shard.61c9d940-fde4-4bed-9389-edc8d7741817.111:
I found this thread:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/FR5V466HBXGRVL3Z3RAFKUPQ2FGK2T53/
but I think it is not relevant for me.
We use only SSDs without separated rocksdb so we don' have spillover.
CPU is not overloaded (80% idle) however it's interesting that in
vmstat the "r" is pretty big which indicates waiting list for cpu:
procs -----------memory---------- ---swap-- -----io---- -system--
------cpu-----
r b swpd free buff cache si so bi bo in cs us
sy id wa st
12 0 0 6205288 329803136 15633844 0 0 2149 1894 0
0 5 3 92 0 0
17 0 0 6156780 329827840 15637252 0 0 125584 250416
646098 1965051 10 8 82 0 0
21 0 0 6154636 329849504 15636112 0 0 99320 245024
493324 1682377 7 6 87 0 0
16 0 0 6144256 329869664 15636924 0 0 87484 301968
623345 1993374 8 7 84 0 0
19 0 0 6057012 329890080 15637932 0 0 160444 303664
549194 1820562 8 6 85 0 0
Any idea what this could indicate on octopus 15.2.17 with ubuntu 20.04?
ty
________________________________
This message is confidential and is for the sole use of the intended
recipient(s). It may also be privileged or otherwise protected by
copyright or other legal rules. If you have received it by mistake
please let us know by reply email and delete it from your system. It
is prohibited to copy this message or disclose its content to
anyone. Any confidentiality or privilege is not waived or lost by
any mistaken delivery or unauthorized disclosure of the message. All
messages sent to and from Agoda may be monitored to ensure
compliance with company policies, to protect the company's interests
and to remove potential malware. Electronic messages may be
intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx