Dear all, I was a bit too optimistic in my previous email. It looks like the leader election loop reappeared. I could fix it by stopping the rogue mon daemon but I don't know how to fix it for good. I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in total). I have about 35 HDDs in each server and 4 SSDs. The servers have about 250 GB of RAM, there is no memory pressure on any daemon. I have an identical mirror cluster that does not have the issue (but that one does not have its file system mounted elsewhere and is running no rgws). I have migrated both clusters recently to cephadm and then from octopus to pacific. The primary cluster has problems (pulled from the cluster before fixing/restarting mon daemon): - `ceph -s` and other commands feel pretty sluggish - `ceph -s` shows inconsistent results in the "health" section and "services" overview - cephfs clients hang and after rebooting the clients, mounting is not possible any more - `ceph config dump` prints "monclient: get_monmap_and_config failed to get config" - I have a mon leader election loop shown in its journalctl output on the bottom. - the primary mds daemon says things like "skipping upkeep work because connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output I tried to reboot the client that is supposedly not reacting to cache pressure but that did not help either. The servers are connected to the same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for public network. I have disabled firewalld on the nodes but that did not fix the problem either. I suspect that "laggy monitors" are caused more probable on the software side than on the network side. I took down the rogue mon.osd-1 with `docker stop` and it looks like the problem disappears then. To summarize: I suspect the cause to be connected to the mon daemons. I have found that similar problems have been reported a couple of times. What is the best way forward? It seems that the general suggestion for such cases is to just "ceph orch redeploy mon", so I did this. Is there any way to find out the root cause to get rid of it? Best wishes, Manuel osd-1 # ceph -s cluster: id: 55633ec3-6c0c-4a02-990c-0f87e0f7a01f health: HEALTH_WARN 1 clients failing to respond to cache pressure 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull services: mon: 5 daemons, quorum (age 4h), out of quorum: osd-1, osd-2, osd-5, osd-4, osd-3 mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3, osd-5.jcfyqe, osd-2 mds: 1/1 daemons up, 1 standby osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs rgw: 12 daemons active (6 hosts, 2 zones) data: volumes: 1/1 healthy pools: 14 pools, 5322 pgs objects: 263.18M objects, 944 TiB usage: 1.4 PiB used, 639 TiB / 2.0 PiB avail pgs: 25576348/789544299 objects misplaced (3.239%) 5026 active+clean 291 active+remapped+backfilling 5 active+remapped+backfill_toofull io: client: 165 B/s wr, 0 op/s rd, 0 op/s wr recovery: 2.3 GiB/s, 652 objects/s progress: Global Recovery Event (53m) [==========================..] (remaining: 3m) osd-1 # ceph health detail HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to respond to cache pressure client_id: 56229355 [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 mon.osd-3 (rank 4) addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum) [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87] pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107] pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82] pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106] pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114] osd-1 # journalctl -f -u ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service -- Logs begin at Wed 2021-09-29 08:52:53 CEST. -- Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"} v 0) v1 Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"}]: dispatch Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "df", "detail": "detail"} v 0) v1 Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? 172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df", "detail": "detail"}]: dispatch Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 85 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 173 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c 29405655..29406327) collect timeout, calling fresh election Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to respond to cache pressure client_id: 56229355 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum) Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87] Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107] Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82] Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106] Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114] Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor election Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 7f6e854ba700 1 paxos.0).electionLogic(26610) init, last seen epoch 26610 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no unique device id for md126: fallback method has no model nor serial' Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor election Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 7f6e854ba700 1 paxos.0).electionLogic(26613) init, last seen epoch 26613, mid-election, bumping Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no unique device id for md126: fallback method has no model nor serial' Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected msg Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": "osd_memory_target"} v 0) v1 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": "osd_memory_target"}]: dispatch Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"} v 0) v1 Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"}]: dispatch Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4) Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000 7f6e89cc3700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is really - (socket is v1: 172.16.35.183:47888/0) Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000 7f6e894c2700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is really - (socket is v1: 172.16.35.182:42746/0) Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 266 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c 29405655..29406327) collect timeout, calling fresh election Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to respond to cache pressure client_id: 56229355 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum) Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87] Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107] Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82] Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106] Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114] Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor election Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 7f6e854ba700 1 paxos.0).electionLogic(26616) init, last seen epoch 26616 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no unique device id for md126: fallback method has no model nor serial' Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected msg Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? 172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix": "status", "format": "json-pretty"}]: dispatch Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": "osd_memory_target"} v 0) v1 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": "osd_memory_target"}]: dispatch Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"} v 0) v1 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"}]: dispatch Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "df", "detail": "detail"} v 0) v1 Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? 172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df", "detail": "detail"}]: dispatch Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000 7f6e87cbf700 1 paxos.0).electionLogic(26617) init, last seen epoch 26617, mid-election, bumping Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000 7f6e87cbf700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no unique device id for md126: fallback method has no model nor serial' Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor election Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 7f6e854ba700 1 paxos.0).electionLogic(26621) init, last seen epoch 26621, mid-election, bumping Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no unique device id for md126: fallback method has no model nor serial' Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 423 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c 29405655..29406327) collect timeout, calling fresh election Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to respond to cache pressure client_id: 56229355 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum) Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 5 pgs backfill_toofull Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87] Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107] Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82] Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106] Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114] Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor election Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 7f6e854ba700 1 paxos.0).electionLogic(26624) init, last seen epoch 26624 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no unique device id for md126: fallback method has no model nor serial' Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected msg Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": "osd_memory_target"} v 0) v1 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": "osd_memory_target"}]: dispatch Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"} v 0) v1 Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": "osd_memory_target"}]: dispatch Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor election Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 7f6e854ba700 1 paxos.0).electionLogic(26627) init, last seen epoch 26627, mid-election, bumping Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no unique device id for md126: fallback method has no model nor serial' Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to assign global_id osd-1 # journalctl -f -u ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdc463500 client_session(request_renewcaps seq 88463) from client.60598827 v1: 172.16.59.39:0/1389838619 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0 client_session(request_renewcaps seq 88463) from client.60598821 v1: 172.16.59.39:0/858534994 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece1e24540 client_session(request_renewcaps seq 88459) from client.60591845 v1: 172.16.59.7:0/1705034209 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece055f340 client_session(request_renewcaps seq 88462) from client.60598851 v1: 172.16.59.26:0/763945533 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00 client_session(request_renewcaps seq 88459) from client.60591994 v1: 172.16.59.7:0/4158829178 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00 client_session(request_renewcaps seq 86286) from client.60712226 v1: 172.16.59.64:0/1098377799 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ec336dc000 client_session(request_renewcaps seq 88463) from client.60591563 v1: 172.16.59.39:0/1765846930 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0 client_session(request_renewcaps seq 86592) from client.60695401 v1: 172.16.59.27:0/2213843285 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40 client_session(request_renewcaps seq 88461) from client.60599085 v1: 172.16.59.19:0/1476359719 Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from client.60591566 v1: 172.16.59.39:0/3197981635 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx