On an HDD-based Quincy 17.2.5 cluster (with DB/WAL on datacenter-class NVMe with enhanced power loss protection), I sometimes (once or twice per week) see log entries similar to what I reproduced below (a bit trimmed): Wed 2023-09-06 22:41:54 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:41:54.429+0000 7f83d813d700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f83b9ec1700' had timed out after 15.000000954s Wed 2023-09-06 22:41:54 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:41:54.429+0000 7f83d793c700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f83b9ec1700' had timed out after 15.000000954s Wed 2023-09-06 22:41:54 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:41:54.469+0000 7f83d713b700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f83b9ec1700' had timed out after 15.000000954s <many more repeats> Wed 2023-09-06 22:42:04 UTC mds02 ceph-mgr@mds02.service[4954]: 2023-09-06T22:42:04.735+0000 7ffab1a20700 0 log_channel(cluster) log [DBG] : pgmap v227303: 5449 pgs: 1 active+clean+laggy, 7 active+clean+snaptrim, 34 active+remapped+backfilling, 13 active+clean+scrubbing, 4738 active+clean, 19 active+clean+scrubbing+deep, 637 active+remapped+backfill_wait; 2.0 PiB data, 2.9 PiB used, 2.2 PiB / 5.2 PiB avail; 103 MiB/s rd, 25 MiB/s wr, 2.76k op/s; 923967567/22817724437 objects misplaced (4.049%); 128 MiB/s, 142 objects/s recovering <some more osd_op_tp threads timed out> Wed 2023-09-06 22:42:06 UTC mds02 ceph-mgr@mds02.service[4954]: 2023-09-06T22:42:06.767+0000 7ffab1a20700 0 log_channel(cluster) log [DBG] : pgmap v227304: 5449 pgs: 2 active+clean+laggy, 7 active+clean+snaptrim, 34 active+remapped+backfilling, 13 active+clean+scrubbing, 4737 active+clean, 19 active+clean+scrubbing+deep, 637 active+remapped+backfill_wait; 2.0 PiB data, 2.9 PiB used, 2.2 PiB / 5.2 PiB avail; 88 MiB/s rd, 23 MiB/s wr, 3.85k op/s; 923967437/22817727283 objects misplaced (4.049%); 115 MiB/s, 126 objects/s recovering <some more osd_op_tp threads timed out> Wed 2023-09-06 22:42:15 UTC mds02 ceph-mgr@mds02.service[4954]: 2023-09-06T22:42:15.311+0000 7ffab1a20700 0 log_channel(cluster) log [DBG] : pgmap v227308: 5449 pgs: 1 active+remapped+backfill_wait+laggy, 3 active+clean+laggy, 7 active+clean+snaptrim, 34 active+remapped+backfilling, 13 active+clean+scrubbing, 4736 active+clean, 19 active+clean+scrubbing+deep, 636 active+remapped+backfill_wait; 2.0 PiB data, 2.9 PiB used, 2.2 PiB / 5.2 PiB avail; 67 MiB/s rd, 13 MiB/s wr, 3.21k op/s; 923966538/22817734196 objects misplaced (4.049%); 134 MiB/s, 137 objects/s recovering <some more osd_op_tp threads timed out> Wed 2023-09-06 22:42:22 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:22.708+0000 7f83b9ec1700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f83b9ec1700' had timed out after 15.000000954s <no more osd_op_tp threads time out> Wed 2023-09-06 22:42:24 UTC ceph-osd06 ceph-osd@12.service[5527]: 2023-09-06T22:42:24.637+0000 7f4fbfcc3700 -1 osd.12 1031871 heartbeat_check: no reply from 10.3.0.9:6864 osd.39 since back 2023-09-06T22:41:54.079627+0000 front 2023-09-06T22:41:54.079742+0000 (oldest deadline 2023-09-06T22:42:24.579224+0000) Wed 2023-09-06 22:42:24 UTC ceph-osd08 ceph-osd@126.service[5505]: 2023-09-06T22:42:24.671+0000 7feb033d8700 -1 osd.126 1031871 heartbeat_check: no reply from 10.3.0.9:6864 osd.39 since back 2023-09-06T22:41:53.996908+0000 front 2023-09-06T22:41:53.997184+0000 (oldest deadline 2023-09-06T22:42:24.496552+0000) Wed 2023-09-06 22:42:24 UTC ceph-osd06 ceph-osd@75.service[5532]: 2023-09-06T22:42:24.797+0000 7fe15bc77700 -1 osd.75 1031871 heartbeat_check: no reply from 10.3.0.9:6864 osd.39 since back 2023-09-06T22:41:49.384037+0000 front 2023-09-06T22:41:49.384031+0000 (oldest deadline 2023-09-06T22:42:24.683399+0000) <some more "no reply" complaints - curiously, most of them come from the old part of the cluster, to which the timed-out OSD also belongs> Wed 2023-09-06 22:42:25 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:25.669+0000 7f8f72216700 1 mon.mds01@0(leader).osd e1031871 prepare_failure osd.39 v1:10.3.0.9:6860/5574 from osd.1 is reporting failure:1 Wed 2023-09-06 22:42:25 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:25.669+0000 7f8f72216700 0 log_channel(cluster) log [DBG] : osd.39 reported failed by osd.1 Wed 2023-09-06 22:42:25 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:25.905+0000 7f8f72216700 1 mon.mds01@0(leader).osd e1031871 prepare_failure osd.39 v1:10.3.0.9:6860/5574 from osd.408 is reporting failure:1 Wed 2023-09-06 22:42:25 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:25.905+0000 7f8f72216700 0 log_channel(cluster) log [DBG] : osd.39 reported failed by osd.408 Wed 2023-09-06 22:42:25 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:25.905+0000 7f8f72216700 1 mon.mds01@0(leader).osd e1031871 we have enough reporters to mark osd.39 down Wed 2023-09-06 22:42:25 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:25.905+0000 7f8f72216700 0 log_channel(cluster) log [INF] : osd.39 failed (root=default,host=ceph-osd09) (2 reporters from different host after 32.232905 >= grace 30.000000) Wed 2023-09-06 22:42:26 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:26.445+0000 7f8f74a1b700 0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN) Wed 2023-09-06 22:42:26 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:26.453+0000 7f8f70212700 1 mon.mds01@0(leader).osd e1031872 e1031872: 459 total, 437 up, 438 in Wed 2023-09-06 22:42:26 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:26.453+0000 7f8f70212700 0 log_channel(cluster) log [DBG] : osdmap e1031872: 459 total, 437 up, 438 in <other MONs report the same> Wed 2023-09-06 22:42:26 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:26.532+0000 7f83d813d700 0 --1- v1:10.3.0.9:6862/5574 >> v1:10.3.0.7:6857/5579 conn(0x55cd63342000 0x55cd63344000 :-1 s=OPENED pgs=12691 cs=1079 l=0).fault initiating reconnect Wed 2023-09-06 22:42:26 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:26.532+0000 7f83d813d700 0 --1- v1:10.3.0.9:6862/5574 >> v1:10.3.0.7:6857/5579 conn(0x55cd63342000 0x55cd63344000 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=12691 cs=1080 l=0).handle_connect_reply_2 connect got RESETSESSION Wed 2023-09-06 22:42:26 UTC ceph-osd07 ceph-osd@91.service[5579]: 2023-09-06T22:42:26.529+0000 7fbf8f3ed700 0 --1- v1:10.3.0.7:6857/5579 >> v1:10.3.0.9:6862/5574 conn(0x5588d67cac00 0x5588f210b000 :6857 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_message_2 accept we reset (peer sent cseq 1080), sending RESETSESSION <some more reconnects and RESETSESSIONs> Wed 2023-09-06 22:42:26 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:26.769+0000 7f8f72216700 0 log_channel(cluster) log [INF] : osd.39 marked itself dead as of e1031872 Wed 2023-09-06 22:42:26 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:26.768+0000 7f83c4ed7700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.39 down, but it is still running Wed 2023-09-06 22:42:26 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:26.768+0000 7f83c4ed7700 0 log_channel(cluster) log [DBG] : map e1031872 wrongly marked me down at e1031872 Wed 2023-09-06 22:42:26 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:26.768+0000 7f83c4ed7700 1 osd.39 1031872 start_waiting_for_healthy Wed 2023-09-06 22:42:26 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:26.772+0000 7f83c4ed7700 1 osd.39 1031872 is_healthy false -- only 0/12 up peers (less than 33%) Wed 2023-09-06 22:42:26 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:26.772+0000 7f83c4ed7700 1 osd.39 1031872 not healthy; waiting to boot Wed 2023-09-06 22:42:27 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:27.481+0000 7f8f70212700 1 mon.mds01@0(leader).osd e1031873 e1031873: 459 total, 437 up, 438 in Then osd.39 comes up by itself: Wed 2023-09-06 22:42:28 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22:42:28.516+0000 7f83d20fe700 1 osd.39 1031873 tick checking mon for new map Wed 2023-09-06 22:42:28 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:28.541+0000 7f8f70212700 0 log_channel(cluster) log [INF] : osd.39 v1:10.3.0.9:6860/5574 boot Wed 2023-09-06 22:42:28 UTC mds01 ceph-mon@mds01.service[3258]: 2023-09-06T22:42:28.541+0000 7f8f70212700 0 log_channel(cluster) log [DBG] : osdmap e1031874: 459 total, 438 up, 438 in 1) Is this the same? 2) It's strange that the majority (but not all) of OSDs saying "no reply" come from the old part of the cluster. Is there any debug option that would allow us to discriminate between the automatic compaction and network issues? At which debug level are automatic compactions announced? I am asking the question (2) because, prior to the first osd_op_tp timeout, there is nothing that announces a compaction. On Thu, Sep 7, 2023 at 5:08 PM J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> wrote: > > We're talking about automatic online compaction here, not running the > command. > > On 9/7/23 04:04, Konstantin Shalygin wrote: > > Hi, > > > >> On 7 Sep 2023, at 10:05, J-P Methot <jp.methot@xxxxxxxxxxxxxxxxx> wrote: > >> > >> We're running latest Pacific on our production cluster and we've been > >> seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed > >> out after 15.000000954s' error. We have reasons to believe this > >> happens each time the RocksDB compaction process is launched on an > >> OSD. My question is, does the cluster detecting that an OSD has timed > >> out interrupt the compaction process? This seems to be what's > >> happening, but it's not immediately obvious. We are currently facing > >> an infinite loop of random OSDs timing out and if the compaction > >> process is interrupted without finishing, it may explain that. > > > > You run the online compacting for this OSD's (`ceph osd compact > > ${osd_id}` command), right? > > > > > > > > k > > -- > Jean-Philippe Méthot > Senior Openstack system administrator > Administrateur système Openstack sénior > PlanetHoster inc. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Alexander E. Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx