Hi, thanks for the suggestion. In the case that I again get a rogue MON, I'll try to do this. I'll also need to figure out then how to pull the meta data from the host, might be visible with `docker inspect`. Cheers, On Wed, Sep 29, 2021 at 6:06 PM <DHilsbos@xxxxxxxxxxxxxx> wrote: > Manuel; > > Reading through this mailing list this morning, I can't help but mentally > connect your issue to Javier's issue. In part because you're both running > 16.2.6. > > Javier's issue seems to be that OSDs aren't registering public / cluster > network addresses correctly. His most recent message indicates that he > pulled the OSD metadata, and found the addresses incorrect there. > > I wonder if your rogue MON might have IP addresses registered wrong. I > don't know how to get metadata, but if you could that might provide > insight. I might also be interesting to extract the current monmap and see > what that says. > > My thoughts, probably not even worth 2 cents, but there you go. > > Thank you, > > Dominic L. Hilsbos, MBA > Vice President - Information Technology > Perform Air International Inc. > DHilsbos@xxxxxxxxxxxxxx > www.PerformAir.com > > -----Original Message----- > From: Manuel Holtgrewe [mailto:zyklenfrei@xxxxxxxxx] > Sent: Wednesday, September 29, 2021 6:43 AM > To: ceph-users > Subject: Leader election loop reappears > > Dear all, > > I was a bit too optimistic in my previous email. It looks like the leader > election loop reappeared. I could fix it by stopping the rogue mon daemon > but I don't know how to fix it for good. > > I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in > total). I have about 35 HDDs in each server and 4 SSDs. The servers have > about 250 GB of RAM, there is no memory pressure on any daemon. I have an > identical mirror cluster that does not have the issue (but that one does > not have its file system mounted elsewhere and is running no rgws). I have > migrated both clusters recently to cephadm and then from octopus to > pacific. > > The primary cluster has problems (pulled from the cluster before > fixing/restarting mon daemon): > > - `ceph -s` and other commands feel pretty sluggish > - `ceph -s` shows inconsistent results in the "health" section and > "services" overview > - cephfs clients hang and after rebooting the clients, mounting is not > possible any more > - `ceph config dump` prints "monclient: get_monmap_and_config failed to get > config" > - I have a mon leader election loop shown in its journalctl output on the > bottom. > - the primary mds daemon says things like "skipping upkeep work because > connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled > message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from > client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output > > I tried to reboot the client that is supposedly not reacting to cache > pressure but that did not help either. The servers are connected to the > same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for > public network. I have disabled firewalld on the nodes but that did not fix > the problem either. I suspect that "laggy monitors" are caused more > probable on the software side than on the network side. > > I took down the rogue mon.osd-1 with `docker stop` and it looks like the > problem disappears then. > > To summarize: I suspect the cause to be connected to the mon daemons. I > have found that similar problems have been reported a couple of times. > > What is the best way forward? It seems that the general suggestion for such > cases is to just "ceph orch redeploy mon", so I did this. > > Is there any way to find out the root cause to get rid of it? > > Best wishes, > Manuel > > osd-1 # ceph -s > cluster: > id: 55633ec3-6c0c-4a02-990c-0f87e0f7a01f > health: HEALTH_WARN > 1 clients failing to respond to cache pressure > 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 > Low space hindering backfill (add storage if this doesn't > resolve itself): 5 pgs backfill_toofull > > services: > mon: 5 daemons, quorum (age 4h), out of quorum: osd-1, osd-2, osd-5, > osd-4, osd-3 > mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3, > osd-5.jcfyqe, osd-2 > mds: 1/1 daemons up, 1 standby > osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs > rgw: 12 daemons active (6 hosts, 2 zones) > > data: > volumes: 1/1 healthy > pools: 14 pools, 5322 pgs > objects: 263.18M objects, 944 TiB > usage: 1.4 PiB used, 639 TiB / 2.0 PiB avail > pgs: 25576348/789544299 objects misplaced (3.239%) > 5026 active+clean > 291 active+remapped+backfilling > 5 active+remapped+backfill_toofull > > io: > client: 165 B/s wr, 0 op/s rd, 0 op/s wr > recovery: 2.3 GiB/s, 652 objects/s > > progress: > Global Recovery Event (53m) > [==========================..] (remaining: 3m) > > osd-1 # ceph health detail > HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down, > quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage > if this doesn't resolve itself): 5 pgs backfill_toofull > [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure > mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to > respond to cache pressure client_id: 56229355 > [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 > mon.osd-3 (rank 4) addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0 > ] > is down (out of quorum) > [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this > doesn't resolve itself): 5 pgs backfill_toofull > pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87] > pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107] > pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82] > pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106] > pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114] > > > osd-1 # journalctl -f -u > ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service > -- Logs begin at Wed 2021-09-29 08:52:53 CEST. -- > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"} v 0) v1 > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 > 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' > entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"}]: dispatch > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "df", "detail": "detail"} v 0) v1 > Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 > 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? > 172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df", > "detail": "detail"}]: dispatch > Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000 > 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000 > 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 85 > slow ops, oldest is mon_command([{prefix=config-key set, > key=mgr/cephadm/host.osd-2}] v 0) > Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000 > 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, > mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000 > 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 173 > slow ops, oldest is mon_command([{prefix=config-key set, > key=mgr/cephadm/host.osd-2}] v 0) > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c > 29405655..29406327) collect timeout, calling fresh election > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN > 1 clients failing to respond to cache pressure; 1/5 mons down, quorum > osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this > doesn't resolve itself): 5 pgs backfill_toofull > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 > clients failing to respond to cache pressure > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : > mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to > respond to cache pressure client_id: 56229355 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons > down, quorum osd-1,osd-2,osd-5,osd-4 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) > addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of > quorum) > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: > Low space hindering backfill (add storage if this doesn't resolve itself): > 5 pgs backfill_toofull > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is > active+remapped+backfill_toofull, acting [145,128,87] > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is > active+remapped+backfill_toofull, acting [133,24,107] > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is > active+remapped+backfill_toofull, acting [100,90,82] > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is > active+remapped+backfill_toofull, acting [155,27,106] > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is > active+remapped+backfill_toofull, acting [153,73,114] > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 > 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor > election > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 > 7f6e854ba700 1 paxos.0).electionLogic(26610) init, last seen epoch 26610 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no > unique device id for md126: fallback method has no model nor serial' > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 > 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor > election > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 > 7f6e854ba700 1 paxos.0).electionLogic(26613) init, last seen epoch 26613, > mid-election, bumping > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no > unique device id for md126: fallback method has no model nor serial' > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected > msg > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": > "osd_memory_target"} v 0) v1 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 > 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' > entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": > "osd_memory_target"}]: dispatch > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"} v 0) v1 > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 > 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' > entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"}]: dispatch > Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000 > 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, > mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4) > Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000 > 7f6e89cc3700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> > conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0 > l=0).handle_client_banner accept peer addr is really - (socket is v1: > 172.16.35.183:47888/0) > Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000 > 7f6e894c2700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> > conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0 > l=0).handle_client_banner accept peer addr is really - (socket is v1: > 172.16.35.182:42746/0) > Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000 > 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 266 > slow ops, oldest is mon_command([{prefix=config-key set, > key=mgr/cephadm/host.osd-2}] v 0) > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c > 29405655..29406327) collect timeout, calling fresh election > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN > 1 clients failing to respond to cache pressure; 1/5 mons down, quorum > osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this > doesn't resolve itself): 5 pgs backfill_toofull > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 > clients failing to respond to cache pressure > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : > mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to > respond to cache pressure client_id: 56229355 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons > down, quorum osd-1,osd-2,osd-5,osd-4 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) > addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of > quorum) > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: > Low space hindering backfill (add storage if this doesn't resolve itself): > 5 pgs backfill_toofull > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is > active+remapped+backfill_toofull, acting [145,128,87] > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is > active+remapped+backfill_toofull, acting [133,24,107] > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is > active+remapped+backfill_toofull, acting [100,90,82] > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is > active+remapped+backfill_toofull, acting [155,27,106] > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is > active+remapped+backfill_toofull, acting [153,73,114] > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 > 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor > election > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 > 7f6e854ba700 1 paxos.0).electionLogic(26616) init, last seen epoch 26616 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no > unique device id for md126: fallback method has no model nor serial' > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected > msg > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 > 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? > 172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix": "status", > "format": "json-pretty"}]: dispatch > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": > "osd_memory_target"} v 0) v1 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 > 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' > entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": > "osd_memory_target"}]: dispatch > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"} v 0) v1 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 > 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' > entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"}]: dispatch > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000 > 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "df", "detail": "detail"} v 0) v1 > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 > 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? > 172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df", > "detail": "detail"}]: dispatch > Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000 > 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000 > 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289 > slow ops, oldest is mon_command([{prefix=config-key set, > key=mgr/cephadm/host.osd-2}] v 0) > Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000 > 7f6e87cbf700 1 paxos.0).electionLogic(26617) init, last seen epoch 26617, > mid-election, bumping > Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000 > 7f6e87cbf700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no > unique device id for md126: fallback method has no model nor serial' > Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 > 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor > election > Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 > 7f6e854ba700 1 paxos.0).electionLogic(26621) init, last seen epoch 26621, > mid-election, bumping > Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no > unique device id for md126: fallback method has no model nor serial' > Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000 > 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289 > slow ops, oldest is mon_command([{prefix=config-key set, > key=mgr/cephadm/host.osd-2}] v 0) > Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000 > 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, > mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000 > 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 423 > slow ops, oldest is mon_command([{prefix=config-key set, > key=mgr/cephadm/host.osd-2}] v 0) > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c > 29405655..29406327) collect timeout, calling fresh election > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN > 1 clients failing to respond to cache pressure; 1/5 mons down, quorum > osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this > doesn't resolve itself): 5 pgs backfill_toofull > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 > clients failing to respond to cache pressure > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : > mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to > respond to cache pressure client_id: 56229355 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons > down, quorum osd-1,osd-2,osd-5,osd-4 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) > addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of > quorum) > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: > Low space hindering backfill (add storage if this doesn't resolve itself): > 5 pgs backfill_toofull > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is > active+remapped+backfill_toofull, acting [145,128,87] > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is > active+remapped+backfill_toofull, acting [133,24,107] > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is > active+remapped+backfill_toofull, acting [100,90,82] > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is > active+remapped+backfill_toofull, acting [155,27,106] > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 > 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is > active+remapped+backfill_toofull, acting [153,73,114] > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 > 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor > election > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 > 7f6e854ba700 1 paxos.0).electionLogic(26624) init, last seen epoch 26624 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no > unique device id for md126: fallback method has no model nor serial' > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected > msg > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": > "osd_memory_target"} v 0) v1 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 > 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' > entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": > "osd_memory_target"}]: dispatch > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 > 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command > mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"} v 0) v1 > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 > 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' > entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": > "osd_memory_target"}]: dispatch > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 > 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor > election > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 > 7f6e854ba700 1 paxos.0).electionLogic(26627) init, last seen epoch 26627, > mid-election, bumping > Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000 > 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no > unique device id for md126: fallback method has no model nor serial' > Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000 > 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to > assign global_id > > osd-1 # journalctl -f -u > ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdc463500 > client_session(request_renewcaps seq 88463) from client.60598827 v1: > 172.16.59.39:0/1389838619 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0 > client_session(request_renewcaps seq 88463) from client.60598821 v1: > 172.16.59.39:0/858534994 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece1e24540 > client_session(request_renewcaps seq 88459) from client.60591845 v1: > 172.16.59.7:0/1705034209 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece055f340 > client_session(request_renewcaps seq 88462) from client.60598851 v1: > 172.16.59.26:0/763945533 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00 > client_session(request_renewcaps seq 88459) from client.60591994 v1: > 172.16.59.7:0/4158829178 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00 > client_session(request_renewcaps seq 86286) from client.60712226 v1: > 172.16.59.64:0/1098377799 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ec336dc000 > client_session(request_renewcaps seq 88463) from client.60591563 v1: > 172.16.59.39:0/1765846930 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0 > client_session(request_renewcaps seq 86592) from client.60695401 v1: > 172.16.59.27:0/2213843285 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40 > client_session(request_renewcaps seq 88461) from client.60599085 v1: > 172.16.59.19:0/1476359719 > Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 > 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340 > client_session(request_renewcaps seq 88463) from client.60591566 v1: > 172.16.59.39:0/3197981635 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx