Leader election loop reappears

Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> · Wed, 29 Sep 2021 15:43:20 +0200

Dear all,

I was a bit too optimistic in my previous email. It looks like the leader
election loop reappeared. I could fix it by stopping the rogue mon daemon
but I don't know how to fix it for good.

I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in
total). I have about 35 HDDs in each server and 4 SSDs. The servers have
about 250 GB of RAM, there is no memory pressure on any daemon. I have an
identical mirror cluster that does not have the issue (but that one does
not have its file system mounted elsewhere and is running no rgws). I have
migrated both clusters recently to cephadm and then from octopus to pacific.

The primary cluster has problems (pulled from the cluster before
fixing/restarting mon daemon):

- `ceph -s` and other commands feel pretty sluggish
- `ceph -s` shows inconsistent results in the "health" section and
"services" overview
- cephfs clients hang and after rebooting the clients, mounting is not
possible any more
- `ceph config dump` prints "monclient: get_monmap_and_config failed to get
config"
- I have a mon leader election loop shown in its journalctl output on the
bottom.
- the primary mds daemon says things like "skipping upkeep work because
connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled
message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from
client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output

I tried to reboot the client that is supposedly not reacting to cache
pressure but that did not help either. The servers are connected to the
same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for
public network. I have disabled firewalld on the nodes but that did not fix
the problem either. I suspect that "laggy monitors" are caused more
probable on the software side than on the network side.

I took down the rogue mon.osd-1 with `docker stop` and it looks like the
problem disappears then.

To summarize: I suspect the cause to be connected to the mon daemons. I
have found that similar problems have been reported a couple of times.

What is the best way forward? It seems that the general suggestion for such
cases is to just "ceph orch redeploy mon", so I did this.

Is there any way to find out the root cause to get rid of it?

Best wishes,
Manuel

osd-1 # ceph -s
  cluster:
    id:     55633ec3-6c0c-4a02-990c-0f87e0f7a01f
    health: HEALTH_WARN
            1 clients failing to respond to cache pressure
            1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
            Low space hindering backfill (add storage if this doesn't
resolve itself): 5 pgs backfill_toofull

  services:
    mon: 5 daemons, quorum  (age 4h), out of quorum: osd-1, osd-2, osd-5,
osd-4, osd-3
    mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3,
osd-5.jcfyqe, osd-2
    mds: 1/1 daemons up, 1 standby
    osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs
    rgw: 12 daemons active (6 hosts, 2 zones)

  data:
    volumes: 1/1 healthy
    pools:   14 pools, 5322 pgs
    objects: 263.18M objects, 944 TiB
    usage:   1.4 PiB used, 639 TiB / 2.0 PiB avail
    pgs:     25576348/789544299 objects misplaced (3.239%)
             5026 active+clean
             291  active+remapped+backfilling
             5    active+remapped+backfill_toofull

  io:
    client:   165 B/s wr, 0 op/s rd, 0 op/s wr
    recovery: 2.3 GiB/s, 652 objects/s

  progress:
    Global Recovery Event (53m)
      [==========================..] (remaining: 3m)

osd-1 # ceph health detail
HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons down,
quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage
if this doesn't resolve itself): 5 pgs backfill_toofull
[WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
    mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
respond to cache pressure client_id: 56229355
[WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
    mon.osd-3 (rank 4) addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0]
is down (out of quorum)
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this
doesn't resolve itself): 5 pgs backfill_toofull
    pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87]
    pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107]
    pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82]
    pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106]
    pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114]

osd-1 # journalctl -f -u
ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service
-- Logs begin at Wed 2021-09-29 08:52:53 CEST. --
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"} v 0) v1
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"}]: dispatch
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "df", "detail": "detail"} v 0) v1
Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df",
"detail": "detail"}]: dispatch
Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000
7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000
7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 85
slow ops, oldest is mon_command([{prefix=config-key set,
key=mgr/cephadm/host.osd-2}] v 0)
Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000
7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000
7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 173
slow ops, oldest is mon_command([{prefix=config-key set,
key=mgr/cephadm/host.osd-2}] v 0)
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
29405655..29406327) collect timeout, calling fresh election
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
1 clients failing to respond to cache pressure; 1/5 mons down, quorum
osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
doesn't resolve itself): 5 pgs backfill_toofull
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
clients failing to respond to cache pressure
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :
mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
respond to cache pressure client_id: 56229355
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
down, quorum osd-1,osd-2,osd-5,osd-4
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum)
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
Low space hindering backfill (add storage if this doesn't resolve itself):
5 pgs backfill_toofull
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
active+remapped+backfill_toofull, acting [145,128,87]
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
active+remapped+backfill_toofull, acting [133,24,107]
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
active+remapped+backfill_toofull, acting [100,90,82]
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
active+remapped+backfill_toofull, acting [155,27,106]
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
active+remapped+backfill_toofull, acting [153,73,114]
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
election
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
7f6e854ba700  1 paxos.0).electionLogic(26610) init, last seen epoch 26610
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
unique device id for md126: fallback method has no model nor serial'
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
election
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
7f6e854ba700  1 paxos.0).electionLogic(26613) init, last seen epoch 26613,
mid-election, bumping
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
unique device id for md126: fallback method has no model nor serial'
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected
msg
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
"osd_memory_target"} v 0) v1
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
"osd_memory_target"}]: dispatch
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"} v 0) v1
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"}]: dispatch
Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000
7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4)
Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000
7f6e89cc3700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
 conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0
l=0).handle_client_banner accept peer addr is really - (socket is v1:
172.16.35.183:47888/0)
Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000
7f6e894c2700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
 conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0
l=0).handle_client_banner accept peer addr is really - (socket is v1:
172.16.35.182:42746/0)
Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000
7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 266
slow ops, oldest is mon_command([{prefix=config-key set,
key=mgr/cephadm/host.osd-2}] v 0)
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
29405655..29406327) collect timeout, calling fresh election
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
1 clients failing to respond to cache pressure; 1/5 mons down, quorum
osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
doesn't resolve itself): 5 pgs backfill_toofull
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
clients failing to respond to cache pressure
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :
mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
respond to cache pressure client_id: 56229355
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
down, quorum osd-1,osd-2,osd-5,osd-4
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum)
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
Low space hindering backfill (add storage if this doesn't resolve itself):
5 pgs backfill_toofull
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
active+remapped+backfill_toofull, acting [145,128,87]
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
active+remapped+backfill_toofull, acting [133,24,107]
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
active+remapped+backfill_toofull, acting [100,90,82]
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
active+remapped+backfill_toofull, acting [155,27,106]
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
active+remapped+backfill_toofull, acting [153,73,114]
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
election
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
7f6e854ba700  1 paxos.0).electionLogic(26616) init, last seen epoch 26616
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
unique device id for md126: fallback method has no model nor serial'
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected
msg
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix": "status",
"format": "json-pretty"}]: dispatch
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
"osd_memory_target"} v 0) v1
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
"osd_memory_target"}]: dispatch
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"} v 0) v1
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"}]: dispatch
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000
7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "df", "detail": "detail"} v 0) v1
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df",
"detail": "detail"}]: dispatch
Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000
7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000
7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289
slow ops, oldest is mon_command([{prefix=config-key set,
key=mgr/cephadm/host.osd-2}] v 0)
Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000
7f6e87cbf700  1 paxos.0).electionLogic(26617) init, last seen epoch 26617,
mid-election, bumping
Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000
7f6e87cbf700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
unique device id for md126: fallback method has no model nor serial'
Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
election
Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
7f6e854ba700  1 paxos.0).electionLogic(26621) init, last seen epoch 26621,
mid-election, bumping
Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
unique device id for md126: fallback method has no model nor serial'
Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000
7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting 289
slow ops, oldest is mon_command([{prefix=config-key set,
key=mgr/cephadm/host.osd-2}] v 0)
Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000
7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000
7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 423
slow ops, oldest is mon_command([{prefix=config-key set,
key=mgr/cephadm/host.osd-2}] v 0)
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
29405655..29406327) collect timeout, calling fresh election
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
1 clients failing to respond to cache pressure; 1/5 mons down, quorum
osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
doesn't resolve itself): 5 pgs backfill_toofull
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
clients failing to respond to cache pressure
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :
mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
respond to cache pressure client_id: 56229355
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
down, quorum osd-1,osd-2,osd-5,osd-4
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum)
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
Low space hindering backfill (add storage if this doesn't resolve itself):
5 pgs backfill_toofull
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
active+remapped+backfill_toofull, acting [145,128,87]
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
active+remapped+backfill_toofull, acting [133,24,107]
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
active+remapped+backfill_toofull, acting [100,90,82]
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
active+remapped+backfill_toofull, acting [155,27,106]
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
active+remapped+backfill_toofull, acting [153,73,114]
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
election
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
7f6e854ba700  1 paxos.0).electionLogic(26624) init, last seen epoch 26624
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
unique device id for md126: fallback method has no model nor serial'
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop unexpected
msg
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
"osd_memory_target"} v 0) v1
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
"osd_memory_target"}]: dispatch
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"} v 0) v1
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
"osd_memory_target"}]: dispatch
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
election
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
7f6e854ba700  1 paxos.0).electionLogic(26627) init, last seen epoch 26627,
mid-election, bumping
Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000
7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
unique device id for md126: fallback method has no model nor serial'
Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id
Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000
7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
assign global_id

osd-1 # journalctl -f -u
ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdc463500
client_session(request_renewcaps seq 88463) from client.60598827 v1:
172.16.59.39:0/1389838619
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0
client_session(request_renewcaps seq 88463) from client.60598821 v1:
172.16.59.39:0/858534994
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece1e24540
client_session(request_renewcaps seq 88459) from client.60591845 v1:
172.16.59.7:0/1705034209
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece055f340
client_session(request_renewcaps seq 88462) from client.60598851 v1:
172.16.59.26:0/763945533
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00
client_session(request_renewcaps seq 88459) from client.60591994 v1:
172.16.59.7:0/4158829178
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00
client_session(request_renewcaps seq 86286) from client.60712226 v1:
172.16.59.64:0/1098377799
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ec336dc000
client_session(request_renewcaps seq 88463) from client.60591563 v1:
172.16.59.39:0/1765846930
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0
client_session(request_renewcaps seq 86592) from client.60695401 v1:
172.16.59.27:0/2213843285
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40
client_session(request_renewcaps seq 88461) from client.60599085 v1:
172.16.59.19:0/1476359719
Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340
client_session(request_renewcaps seq 88463) from client.60591566 v1:
172.16.59.39:0/3197981635
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx