Re: Leader election loop reappears

Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> · Wed, 29 Sep 2021 18:11:56 +0200

Hi,

thanks for the suggestion. In the case that I again get a rogue MON, I'll
try to do this. I'll also need to figure out then how to pull the meta data
from the host, might be visible with `docker inspect`.

Cheers,

On Wed, Sep 29, 2021 at 6:06 PM <DHilsbos@xxxxxxxxxxxxxx> wrote:

> Manuel;
>
> Reading through this mailing list this morning, I can't help but mentally
> connect your issue to Javier's issue.  In part because you're both running
> 16.2.6.
>
> Javier's issue seems to be that OSDs aren't registering public / cluster
> network addresses correctly.  His most recent message indicates that he
> pulled the OSD metadata, and found the addresses incorrect there.
>
> I wonder if your rogue MON might have IP addresses registered wrong.  I
> don't know how to get metadata, but if you could that might provide
> insight.  I might also be interesting to extract the current monmap and see
> what that says.
>
> My thoughts, probably not even worth 2 cents, but there you go.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Vice President - Information Technology
> Perform Air International Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
> -----Original Message-----
> From: Manuel Holtgrewe [mailto:zyklenfrei@xxxxxxxxx]
> Sent: Wednesday, September 29, 2021 6:43 AM
> To: ceph-users
> Subject:  Leader election loop reappears
>
> Dear all,
>
> I was a bit too optimistic in my previous email. It looks like the leader
> election loop reappeared. I could fix it by stopping the rogue mon daemon
> but I don't know how to fix it for good.
>
> I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in
> total). I have about 35 HDDs in each server and 4 SSDs. The servers have
> about 250 GB of RAM, there is no memory pressure on any daemon. I have an
> identical mirror cluster that does not have the issue (but that one does
> not have its file system mounted elsewhere and is running no rgws). I have
> migrated both clusters recently to cephadm and then from octopus to
> pacific.
>
> The primary cluster has problems (pulled from the cluster before
> fixing/restarting mon daemon):
>
> - `ceph -s` and other commands feel pretty sluggish
> - `ceph -s` shows inconsistent results in the "health" section and
> "services" overview
> - cephfs clients hang and after rebooting the clients, mounting is not
> possible any more
> - `ceph config dump` prints "monclient: get_monmap_and_config failed to get
> config"
> - I have a mon leader election loop shown in its journalctl output on the
> bottom.
> - the primary mds daemon says things like "skipping upkeep work because
> connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled
> message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from
> client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output
>
> I tried to reboot the client that is supposedly not reacting to cache
> pressure but that did not help either. The servers are connected to the
> same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for
> public network. I have disabled firewalld on the nodes but that did not fix
> the problem either. I suspect that "laggy monitors" are caused more
> probable on the software side than on the network side.
>
> I took down the rogue mon.osd-1 with `docker stop` and it looks like the
> problem disappears then.
>
> To summarize: I suspect the cause to be connected to the mon daemons. I
> have found that similar problems have been reported a couple of times.
>
> What is the best way forward? It seems that the general suggestion for such
> cases is to just "ceph orch redeploy mon", so I did this.
>
> Is there any way to find out the root cause to get rid of it?
>
> Best wishes,
> Manuel
>
> osd-1 # ceph -s
>   cluster:
>     id:     55633ec3-6c0c-4a02-990c-0f87e0f7a01f
>     health: HEALTH_WARN
>             1 clients failing to respond to cache pressure
>             1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
>             Low space hindering backfill (add storage if this doesn't
> resolve itself): 5 pgs backfill_toofull
>
>   services:
>     mon: 5 daemons, quorum  (age 4h), out of quorum: osd-1, osd-2, osd-5,
> osd-4, osd-3
>     mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3,
> osd-5.jcfyqe, osd-2
>     mds: 1/1 daemons up, 1 standby
>     osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs
>     rgw: 12 daemons active (6 hosts, 2 zones)
>
>   data:
>     volumes: 1/1 healthy
>     pools:   14 pools, 5322 pgs
>     objects: 263.18M objects, 944 TiB
>     usage:   1.4 PiB used, 639 TiB / 2.0 PiB avail
>     pgs:     25576348/789544299 objects misplaced (3.239%)
>              5026 active+clean
>              291  active+remapped+backfilling
>              5    active+remapped+backfill_toofull
>
>   io:
>     client:   165 B/s wr, 0 op/s rd, 0 op/s wr
>     recovery: 2.3 GiB/s, 652 objects/s
>
>   progress:
>     Global Recovery Event (53m)
>       [==========================..] (remaining: 3m)
>
> osd-1 # ceph health detail
> HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down,
> quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage
> if this doesn't resolve itself): 5 pgs backfill_toofull
> [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
>     mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
> respond to cache pressure client_id: 56229355
> [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
>     mon.osd-3 (rank 4) addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0
> ]
> is down (out of quorum)
> [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this
> doesn't resolve itself): 5 pgs backfill_toofull
>     pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87]
>     pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107]
>     pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82]
>     pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106]
>     pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114]
>
>
> osd-1 # journalctl -f -u
> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service
> -- Logs begin at Wed 2021-09-29 08:52:53 CEST. --
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"} v 0) v1
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"}]: dispatch
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "df", "detail": "detail"} v 0) v1
> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
> 172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df",
> "detail": "detail"}]: dispatch
> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000
> 7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000
> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 85
> slow ops, oldest is mon_command([{prefix=config-key set,
> key=mgr/cephadm/host.osd-2}] v 0)
> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000
> 7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000
> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 173
> slow ops, oldest is mon_command([{prefix=config-key set,
> key=mgr/cephadm/host.osd-2}] v 0)
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
> 29405655..29406327) collect timeout, calling fresh election
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
> doesn't resolve itself): 5 pgs backfill_toofull
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
> clients failing to respond to cache pressure
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
> respond to cache pressure client_id: 56229355
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
> down, quorum osd-1,osd-2,osd-5,osd-4
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
> quorum)
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
> Low space hindering backfill (add storage if this doesn't resolve itself):
> 5 pgs backfill_toofull
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
> active+remapped+backfill_toofull, acting [145,128,87]
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
> active+remapped+backfill_toofull, acting [133,24,107]
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
> active+remapped+backfill_toofull, acting [100,90,82]
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
> active+remapped+backfill_toofull, acting [155,27,106]
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
> active+remapped+backfill_toofull, acting [153,73,114]
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
> election
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
> 7f6e854ba700  1 paxos.0).electionLogic(26610) init, last seen epoch 26610
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
> unique device id for md126: fallback method has no model nor serial'
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
> election
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
> 7f6e854ba700  1 paxos.0).electionLogic(26613) init, last seen epoch 26613,
> mid-election, bumping
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
> unique device id for md126: fallback method has no model nor serial'
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected
> msg
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
> "osd_memory_target"} v 0) v1
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
> "osd_memory_target"}]: dispatch
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"} v 0) v1
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"}]: dispatch
> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000
> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
> mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4)
> Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000
> 7f6e89cc3700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
>  conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0
> l=0).handle_client_banner accept peer addr is really - (socket is v1:
> 172.16.35.183:47888/0)
> Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000
> 7f6e894c2700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
>  conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0
> l=0).handle_client_banner accept peer addr is really - (socket is v1:
> 172.16.35.182:42746/0)
> Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000
> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 266
> slow ops, oldest is mon_command([{prefix=config-key set,
> key=mgr/cephadm/host.osd-2}] v 0)
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
> 29405655..29406327) collect timeout, calling fresh election
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
> doesn't resolve itself): 5 pgs backfill_toofull
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
> clients failing to respond to cache pressure
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
> respond to cache pressure client_id: 56229355
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
> down, quorum osd-1,osd-2,osd-5,osd-4
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
> quorum)
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
> Low space hindering backfill (add storage if this doesn't resolve itself):
> 5 pgs backfill_toofull
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
> active+remapped+backfill_toofull, acting [145,128,87]
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
> active+remapped+backfill_toofull, acting [133,24,107]
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
> active+remapped+backfill_toofull, acting [100,90,82]
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
> active+remapped+backfill_toofull, acting [155,27,106]
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
> active+remapped+backfill_toofull, acting [153,73,114]
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
> election
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
> 7f6e854ba700  1 paxos.0).electionLogic(26616) init, last seen epoch 26616
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
> unique device id for md126: fallback method has no model nor serial'
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected
> msg
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
> 172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix": "status",
> "format": "json-pretty"}]: dispatch
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
> "osd_memory_target"} v 0) v1
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
> "osd_memory_target"}]: dispatch
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"} v 0) v1
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"}]: dispatch
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000
> 7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "df", "detail": "detail"} v 0) v1
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
> 172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df",
> "detail": "detail"}]: dispatch
> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000
> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000
> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289
> slow ops, oldest is mon_command([{prefix=config-key set,
> key=mgr/cephadm/host.osd-2}] v 0)
> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000
> 7f6e87cbf700  1 paxos.0).electionLogic(26617) init, last seen epoch 26617,
> mid-election, bumping
> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000
> 7f6e87cbf700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
> unique device id for md126: fallback method has no model nor serial'
> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
> election
> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
> 7f6e854ba700  1 paxos.0).electionLogic(26621) init, last seen epoch 26621,
> mid-election, bumping
> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
> unique device id for md126: fallback method has no model nor serial'
> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000
> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289
> slow ops, oldest is mon_command([{prefix=config-key set,
> key=mgr/cephadm/host.osd-2}] v 0)
> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000
> 7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000
> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 423
> slow ops, oldest is mon_command([{prefix=config-key set,
> key=mgr/cephadm/host.osd-2}] v 0)
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
> 29405655..29406327) collect timeout, calling fresh election
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
> doesn't resolve itself): 5 pgs backfill_toofull
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
> clients failing to respond to cache pressure
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
> respond to cache pressure client_id: 56229355
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
> down, quorum osd-1,osd-2,osd-5,osd-4
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
> quorum)
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
> Low space hindering backfill (add storage if this doesn't resolve itself):
> 5 pgs backfill_toofull
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
> active+remapped+backfill_toofull, acting [145,128,87]
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
> active+remapped+backfill_toofull, acting [133,24,107]
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
> active+remapped+backfill_toofull, acting [100,90,82]
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
> active+remapped+backfill_toofull, acting [155,27,106]
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
> active+remapped+backfill_toofull, acting [153,73,114]
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
> election
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
> 7f6e854ba700  1 paxos.0).electionLogic(26624) init, last seen epoch 26624
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
> unique device id for md126: fallback method has no model nor serial'
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected
> msg
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
> "osd_memory_target"} v 0) v1
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
> "osd_memory_target"}]: dispatch
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"} v 0) v1
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
> "osd_memory_target"}]: dispatch
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
> election
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
> 7f6e854ba700  1 paxos.0).electionLogic(26627) init, last seen epoch 26627,
> mid-election, bumping
> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000
> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
> unique device id for md126: fallback method has no model nor serial'
> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000
> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
> assign global_id
>
> osd-1 # journalctl -f -u
> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdc463500
> client_session(request_renewcaps seq 88463) from client.60598827 v1:
> 172.16.59.39:0/1389838619
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0
> client_session(request_renewcaps seq 88463) from client.60598821 v1:
> 172.16.59.39:0/858534994
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece1e24540
> client_session(request_renewcaps seq 88459) from client.60591845 v1:
> 172.16.59.7:0/1705034209
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece055f340
> client_session(request_renewcaps seq 88462) from client.60598851 v1:
> 172.16.59.26:0/763945533
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00
> client_session(request_renewcaps seq 88459) from client.60591994 v1:
> 172.16.59.7:0/4158829178
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00
> client_session(request_renewcaps seq 86286) from client.60712226 v1:
> 172.16.59.64:0/1098377799
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ec336dc000
> client_session(request_renewcaps seq 88463) from client.60591563 v1:
> 172.16.59.39:0/1765846930
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0
> client_session(request_renewcaps seq 86592) from client.60695401 v1:
> 172.16.59.27:0/2213843285
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40
> client_session(request_renewcaps seq 88461) from client.60599085 v1:
> 172.16.59.19:0/1476359719
> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340
> client_session(request_renewcaps seq 88463) from client.60591566 v1:
> 172.16.59.39:0/3197981635
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx