Re: Leader election loop reappears

Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> · Sun, 3 Oct 2021 11:47:16 +0200

After still more digging, I found the following high numbers of failed
connection attempts on my osd nodes, see bottom the netstat output (nstat
is also useful as it allows to reset the counters). The failed connection
attempts could be too high. I found an old thread on the mailing list that
recommended to enable logging of reset connections to syslog.

```
iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG
```

This was very useful and I saw a lot of failed connection attempts to port
8443, so something with the dashboard. I also noticed that there were a lot
of "beast" error message which appears to be related to RGW. So I stopped
everything except for the bare essentials of mds, mgr, mon, osd.

The cluster appeared to stabilize after a full reboot. It is hard to judge
whether this would hold on for long - the problem sometimes appeared only
after several hours. Next, I deployed prometheus with `ceph orch` and
everything remained OK.

I then began to `ceph orch deploy rgw` for my default realm which caused no
apparent problem. Once I deployed the ingress service for this with the
following YAML:

```
service_type: ingress
service_id: rgw.default
placement:
  count: 6
spec:
  backend_service: rgw.default
  virtual_ip: 172.16.62.26/19
  frontend_port: 443
  monitor_port: 1967
  ssl_cert: |
    -----BEGIN PRIVATE KEY-----
    ...
```

I began seeing **a lot** of beast debug messages as follows:

```
Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000 7f80a9398700
 1 ====== req done req=0x7f81e6127620 op status=0 http_status=200
latency=0.000999998s ======
Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000 7f80a9398700
 1 beast: 0x7f81e6127620: 172.16.62.11 - anonymous
[03/Oct/2021:09:21:00.095 +0000] "HEAD / HTTP/1.0" 200 0 - - -
latency=0.000999998s
Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f80d1be9700
 1 ====== starting new request req=0x7f81e6127620 =====
Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f819d580700
 0 ERROR: client_io->complete_request() returned Connection reset by peer
Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f819d580700
 1 ====== req done req=0x7f81e6127620 op status=0 http_status=200
latency=0.000000000s ======
Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f819d580700
 1 beast: 0x7f81e6127620: 172.16.62.12 - anonymous
[03/Oct/2021:09:21:00.568 +0000] "HEAD / HTTP/1.0" 200 0 - - -
latency=0.000000000s
Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.583+0000 7f80c1bc9700
 1 ====== starting new request req=0x7f81e6127620 =====
```

and the TCP connection reset counters started to jump again (the monitors
remained stable still). To me this indicates that haproxy is most probably
the culprit for the high number of "connection resets received", maybe
unrelated to my cluster stability. Also see the connection setting: `option
httpchk HEAD / HTTP/1.0`, full haproxy and keepalived config below.

This leads me to the question:

- Is this normal / to be expected?

I found this StackOverflow thread

-
https://stackoverflow.com/questions/21550337/haproxy-netty-way-to-prevent-exceptions-on-connection-reset/40005338#40005338

I now have the following `ceph orch ls` output and will now wait overnight
whether things remain stable. It is my feeling that prometheus should not
destabilize things and I can live with the other services being disabled
for a while.

```
# ceph orch ls
NAME                 PORTS                  RUNNING  REFRESHED  AGE
 PLACEMENT
ingress.rgw.default  172.16.62.26:443,1967    12/12  5m ago     16m  count:6
mds.cephfs                                      2/2  4m ago     4d
count-per-host:1;label:mds
mgr                                             5/5  5m ago     5d   count:5
mon                                             5/5  5m ago     2d   count:5
osd.unmanaged                               180/180  5m ago     -
 <unmanaged>
prometheus           ?:9095                     2/2  3m ago     18m  count:2
rgw.default          ?:8000                     6/6  5m ago     25m
 count-per-host:1;label:rgw
```

Cheers,
Manuel

```
# output of netstat -s | grep -A 10 ^ Tcp:
+ ssh osd-1 netstat -s
Tcp:
    1043521 active connections openings
    449583 passive connection openings
    28923 failed connection attempts
    310376 connection resets received
    12100 connections established
    389101110 segments received
    590111283 segments send out
    722988 segments retransmited
    180 bad segments received.
    260749 resets sent
```

```
# cat
/var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/keepalived.rgw.default.osd-1.vrjiew/keepalived.conf
# This file is generated by cephadm.
vrrp_script check_backend {
    script "/usr/bin/curl http://localhost:1967/health";
    weight -20
    interval 2
    rise 2
    fall 2
}

vrrp_instance VI_0 {
  state MASTER
  priority 100
  interface bond0
  virtual_router_id 51
  advert_int 1
  authentication {
      auth_type PASS
      auth_pass qghwhcnanqsltihgtpsm
  }
  unicast_src_ip 172.16.62.10
  unicast_peer {
    172.16.62.11
    172.16.62.12
    172.16.62.13
    172.16.62.30
    172.16.62.31
  }
  virtual_ipaddress {
    172.16.62.26/19 dev bond0
  }
  track_script {
      check_backend
  }
# cat
/var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/haproxy.rgw.default.osd-1.urpnuu/haproxy/haproxy.cfg
# This file is generated by cephadm.
global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/lib/haproxy/haproxy.pid
    maxconn     8000
    daemon
    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout queue           20s
    timeout connect         5s
    timeout http-request    1s
    timeout http-keep-alive 5s
    timeout client          1s
    timeout server          1s
    timeout check           5s
    maxconn                 8000

frontend stats
    mode http
    bind *:1967
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:ivlgujuagrksajemsqyg
    http-request use-service prometheus-exporter if { path /metrics }
    monitor-uri /health

frontend frontend
    bind *:443 ssl crt /var/lib/haproxy/haproxy.pem
    default_backend backend

backend backend
    option forwardfor
    balance static-rr
    option httpchk HEAD / HTTP/1.0
    server rgw.default.osd-1.xqrjwp 172.16.62.10:8000 check weight 100
    server rgw.default.osd-2.lopjij 172.16.62.11:8000 check weight 100
    server rgw.default.osd-3.plbqka 172.16.62.12:8000 check weight 100
    server rgw.default.osd-4.jvkhen 172.16.62.13:8000 check weight 100
    server rgw.default.osd-5.hjxnrb 172.16.62.30:8000 check weight 100
    server rgw.default.osd-6.bdrxdd 172.16.62.31:8000 check weight 100
```

On Sat, Oct 2, 2021 at 2:32 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
wrote:

> Dear all,
>
> I previously sent an email to the list regarding something that I called
> "leader election" loop. The problem has reappeared several time and I don't
> know how to proceed debugging or fixing this.
>
> I have 6 nodes osd-{1..6} and monitors are on osd-{1..5}. I run ceph
> 15.2.14 using cephadm on CentOS 7.9 (kernel 3.10.0-1160.42.2.el7.x86_64).
>
> The symptoms are (also see my previous email).
>
> - `ceph -s` takes a long time or does not reeturn
> --- I sometime see messages "monclient: get_monmap_and_config failed to
> get config"
> --- I sometimes see messages "problem getting command descriptions from
> mon.osd-2" (always works with admin socket of course)
> - I sometimes ee all daemons out of quorum in `ceph -s`
> - different monitors go out of quorum and go back in
> - leader election is reinitiated every few seconds
> - the monitors appear to go correctly between "electing" and "peon" but
> the issue is that leader election is performed every few seconds...
>
> I have done all the checks in the "troubleshooting monitors" up to the
> point where it says "reach out to the community". In particular, I checked
> the mon_stats and each monitor sees all others on the correct public IP and
> I can telnet to 3300 and 6789 from each monitor to all others.
>
> I have bumped the nf_conntrack settings although I don't have any entries
> in the syslog yet about dropping packages. `netstat -s` shows a few dropped
> packages (e.g., 172 outgoing dropped, 18 dropped because of missing route).
>
> Also, I have added public servers and the cluster itself to chrony.conf
> (see below). The output of `chronyc sources -v` indicates to me that the
> cluster itself is in sync and clock skew is below 10 ns.
>
> I am able to inject the debug level 10/10 increase into the monitors, had
> to repeat for one out of quroum monitor that first said "Error ENXIO:
> problem getting command descriptions from mon.osd-5" but then accepted by
> "ceph tell".
>
> I have pulled the logs for two minutes while the cluster was running its
> leader election loop and attached them. They are a couple of thousand lines
> each and should show the problem. I'd be happy to send fewer or more lines,
> though.
>
> I'd be happy about any help or suggestions towards a resolution.
>
> Best wishes,
> Manuel
>
> ```
> # 2>/dev/null sysctl -a | grep nf_ | egrep 'max|bucket'
> net.netfilter.nf_conntrack_buckets = 2500096
> net.netfilter.nf_conntrack_expect_max = 39060
> net.netfilter.nf_conntrack_max = 10000000
> net.netfilter.nf_conntrack_tcp_max_retrans = 3
> net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
> net.nf_conntrack_max = 10000000
> ```
>
> ```
> # from chrony.conf
> server 172.16.35.140 iburst
> server 172.16.35.141 iburst
> server 172.16.35.142 iburst
> server osd-1 iburst
> server osd-2 iburst
> server osd-3 iburst
> server osd-4 iburst
> server osd-5 iburst
> server osd-6 iburst
> server 0.de.pool.ntp.org iburst
> server 1.de.pool.ntp.org iburst
> server 2.de.pool.ntp.org iburst
> server 3.de.pool.ntp.org iburst
> ```
>
> ```
> # chronyc sources -v
> 210 Number of sources = 13
>
>   .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
>  / .- Source state '*' = current synced, '+' = combined , '-' = not
> combined,
> | /   '?' = unreachable, 'x' = time may be in error, '~' = time too
> variable.
> ||                                                 .- xxxx [ yyyy ] +/-
> zzzz
> ||      Reachability register (octal) -.           |  xxxx = adjusted
> offset,
> ||      Log2(Polling interval) --.      |          |  yyyy = measured
> offset,
> ||                                \     |          |  zzzz = estimated
> error.
> ||                                 |    |           \
> MS Name/IP address         Stratum Poll Reach LastRx Last sample
>
> ===============================================================================
> ^- 172.16.35.140                 3   6   377    55   +213us[ +213us] +/-
> 26ms
> ^+ 172.16.35.141                 2   6   377    63   +807us[ +807us] +/-
> 12ms
> ^+ 172.16.35.142                 3   9   377   253  +1488us[+1488us] +/-
> 7675us
> ^+ osd-1                         3   6   377    62   +145us[ +145us] +/-
> 7413us
> ^+ osd-2                         2   6   377    61  -6577ns[-6577ns] +/-
> 8108us
> ^+ osd-3                         4   6   377    50   +509us[ +509us] +/-
> 6810us
> ^+ osd-4                         4   6   377    54   +447us[ +447us] +/-
> 7231us
> ^+ osd-5                         3   6   377    52   +252us[ +252us] +/-
> 6738us
> ^+ osd-6                         2   6   377    56    -13us[  -13us] +/-
> 8563us
> ^+ funky.f5s.de                  2   8   377   207   +371us[ +371us] +/-
>   24ms
> ^- hetzner01.ziegenberg.at       2  10   377   445   +735us[ +685us] +/-
>   32ms
> ^* time1.uni-paderborn.de        1   9   377   253  -4246us[-4297us] +/-
> 9089us
> ^- 25000-021.cloud.services>     2  10   377   147   +832us[ +832us] +/-
> 48ms
> ```
>
> On Wed, Sep 29, 2021 at 3:43 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
> wrote:
>
>> Dear all,
>>
>> I was a bit too optimistic in my previous email. It looks like the leader
>> election loop reappeared. I could fix it by stopping the rogue mon daemon
>> but I don't know how to fix it for good.
>>
>> I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in
>> total). I have about 35 HDDs in each server and 4 SSDs. The servers have
>> about 250 GB of RAM, there is no memory pressure on any daemon. I have an
>> identical mirror cluster that does not have the issue (but that one does
>> not have its file system mounted elsewhere and is running no rgws). I have
>> migrated both clusters recently to cephadm and then from octopus to pacific.
>>
>> The primary cluster has problems (pulled from the cluster before
>> fixing/restarting mon daemon):
>>
>> - `ceph -s` and other commands feel pretty sluggish
>> - `ceph -s` shows inconsistent results in the "health" section and
>> "services" overview
>> - cephfs clients hang and after rebooting the clients, mounting is not
>> possible any more
>> - `ceph config dump` prints "monclient: get_monmap_and_config failed to
>> get config"
>> - I have a mon leader election loop shown in its journalctl output on the
>> bottom.
>> - the primary mds daemon says things like "skipping upkeep work because
>> connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled
>> message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from
>> client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output
>>
>> I tried to reboot the client that is supposedly not reacting to cache
>> pressure but that did not help either. The servers are connected to the
>> same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for
>> public network. I have disabled firewalld on the nodes but that did not fix
>> the problem either. I suspect that "laggy monitors" are caused more
>> probable on the software side than on the network side.
>>
>> I took down the rogue mon.osd-1 with `docker stop` and it looks like the
>> problem disappears then.
>>
>> To summarize: I suspect the cause to be connected to the mon daemons. I
>> have found that similar problems have been reported a couple of times.
>>
>> What is the best way forward? It seems that the general suggestion for
>> such cases is to just "ceph orch redeploy mon", so I did this.
>>
>> Is there any way to find out the root cause to get rid of it?
>>
>> Best wishes,
>> Manuel
>>
>> osd-1 # ceph -s
>>   cluster:
>>     id:     55633ec3-6c0c-4a02-990c-0f87e0f7a01f
>>     health: HEALTH_WARN
>>             1 clients failing to respond to cache pressure
>>             1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
>>             Low space hindering backfill (add storage if this doesn't
>> resolve itself): 5 pgs backfill_toofull
>>
>>   services:
>>     mon: 5 daemons, quorum  (age 4h), out of quorum: osd-1, osd-2, osd-5,
>> osd-4, osd-3
>>     mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3,
>> osd-5.jcfyqe, osd-2
>>     mds: 1/1 daemons up, 1 standby
>>     osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs
>>     rgw: 12 daemons active (6 hosts, 2 zones)
>>
>>   data:
>>     volumes: 1/1 healthy
>>     pools:   14 pools, 5322 pgs
>>     objects: 263.18M objects, 944 TiB
>>     usage:   1.4 PiB used, 639 TiB / 2.0 PiB avail
>>     pgs:     25576348/789544299 objects misplaced (3.239%)
>>              5026 active+clean
>>              291  active+remapped+backfilling
>>              5    active+remapped+backfill_toofull
>>
>>   io:
>>     client:   165 B/s wr, 0 op/s rd, 0 op/s wr
>>     recovery: 2.3 GiB/s, 652 objects/s
>>
>>   progress:
>>     Global Recovery Event (53m)
>>       [==========================..] (remaining: 3m)
>>
>> osd-1 # ceph health detail
>> HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons
>> down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add
>> storage if this doesn't resolve itself): 5 pgs backfill_toofull
>> [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
>>     mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
>> respond to cache pressure client_id: 56229355
>> [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
>>     mon.osd-3 (rank 4) addr [v2:
>> 172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum)
>> [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this
>> doesn't resolve itself): 5 pgs backfill_toofull
>>     pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87]
>>     pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107]
>>     pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82]
>>     pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106]
>>     pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114]
>>
>>
>> osd-1 # journalctl -f -u
>> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service
>> -- Logs begin at Wed 2021-09-29 08:52:53 CEST. --
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"} v 0) v1
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"}]: dispatch
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "df", "detail": "detail"} v 0) v1
>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
>> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
>> 172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df",
>> "detail": "detail"}]: dispatch
>> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000
>> 7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000
>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting
>> 85 slow ops, oldest is mon_command([{prefix=config-key set,
>> key=mgr/cephadm/host.osd-2}] v 0)
>> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
>> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000
>> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 173
>> slow ops, oldest is mon_command([{prefix=config-key set,
>> key=mgr/cephadm/host.osd-2}] v 0)
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
>> 29405655..29406327) collect timeout, calling fresh election
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
>> doesn't resolve itself): 5 pgs backfill_toofull
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
>> clients failing to respond to cache pressure
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
>> respond to cache pressure client_id: 56229355
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
>> down, quorum osd-1,osd-2,osd-5,osd-4
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
>> quorum)
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
>> Low space hindering backfill (add storage if this doesn't resolve itself):
>> 5 pgs backfill_toofull
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
>> active+remapped+backfill_toofull, acting [145,128,87]
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
>> active+remapped+backfill_toofull, acting [133,24,107]
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
>> active+remapped+backfill_toofull, acting [100,90,82]
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
>> active+remapped+backfill_toofull, acting [155,27,106]
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
>> active+remapped+backfill_toofull, acting [153,73,114]
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>> election
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
>> 7f6e854ba700  1 paxos.0).electionLogic(26610) init, last seen epoch 26610
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>> unique device id for md126: fallback method has no model nor serial'
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>> election
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
>> 7f6e854ba700  1 paxos.0).electionLogic(26613) init, last seen epoch 26613,
>> mid-election, bumping
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>> unique device id for md126: fallback method has no model nor serial'
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop
>> unexpected msg
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
>> "osd_memory_target"} v 0) v1
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
>> "osd_memory_target"}]: dispatch
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"} v 0) v1
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"}]: dispatch
>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000
>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
>> mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4)
>> Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000
>> 7f6e89cc3700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
>>  conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0
>> l=0).handle_client_banner accept peer addr is really - (socket is v1:
>> 172.16.35.183:47888/0)
>> Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000
>> 7f6e894c2700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
>>  conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0
>> l=0).handle_client_banner accept peer addr is really - (socket is v1:
>> 172.16.35.182:42746/0)
>> Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000
>> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 266
>> slow ops, oldest is mon_command([{prefix=config-key set,
>> key=mgr/cephadm/host.osd-2}] v 0)
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
>> 29405655..29406327) collect timeout, calling fresh election
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
>> doesn't resolve itself): 5 pgs backfill_toofull
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
>> clients failing to respond to cache pressure
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
>> respond to cache pressure client_id: 56229355
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
>> down, quorum osd-1,osd-2,osd-5,osd-4
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
>> quorum)
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
>> Low space hindering backfill (add storage if this doesn't resolve itself):
>> 5 pgs backfill_toofull
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
>> active+remapped+backfill_toofull, acting [145,128,87]
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
>> active+remapped+backfill_toofull, acting [133,24,107]
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
>> active+remapped+backfill_toofull, acting [100,90,82]
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
>> active+remapped+backfill_toofull, acting [155,27,106]
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
>> active+remapped+backfill_toofull, acting [153,73,114]
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>> election
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
>> 7f6e854ba700  1 paxos.0).electionLogic(26616) init, last seen epoch 26616
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>> unique device id for md126: fallback method has no model nor serial'
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop
>> unexpected msg
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
>> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
>> 172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix":
>> "status", "format": "json-pretty"}]: dispatch
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
>> "osd_memory_target"} v 0) v1
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
>> "osd_memory_target"}]: dispatch
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"} v 0) v1
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"}]: dispatch
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000
>> 7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "df", "detail": "detail"} v 0) v1
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
>> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
>> 172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df",
>> "detail": "detail"}]: dispatch
>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000
>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000
>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting
>> 289 slow ops, oldest is mon_command([{prefix=config-key set,
>> key=mgr/cephadm/host.osd-2}] v 0)
>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000
>> 7f6e87cbf700  1 paxos.0).electionLogic(26617) init, last seen epoch 26617,
>> mid-election, bumping
>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000
>> 7f6e87cbf700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>> unique device id for md126: fallback method has no model nor serial'
>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>> election
>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
>> 7f6e854ba700  1 paxos.0).electionLogic(26621) init, last seen epoch 26621,
>> mid-election, bumping
>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>> unique device id for md126: fallback method has no model nor serial'
>> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000
>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting
>> 289 slow ops, oldest is mon_command([{prefix=config-key set,
>> key=mgr/cephadm/host.osd-2}] v 0)
>> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
>> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000
>> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 423
>> slow ops, oldest is mon_command([{prefix=config-key set,
>> key=mgr/cephadm/host.osd-2}] v 0)
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
>> 29405655..29406327) collect timeout, calling fresh election
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
>> doesn't resolve itself): 5 pgs backfill_toofull
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
>> clients failing to respond to cache pressure
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
>> respond to cache pressure client_id: 56229355
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
>> down, quorum osd-1,osd-2,osd-5,osd-4
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
>> quorum)
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
>> Low space hindering backfill (add storage if this doesn't resolve itself):
>> 5 pgs backfill_toofull
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
>> active+remapped+backfill_toofull, acting [145,128,87]
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
>> active+remapped+backfill_toofull, acting [133,24,107]
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
>> active+remapped+backfill_toofull, acting [100,90,82]
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
>> active+remapped+backfill_toofull, acting [155,27,106]
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
>> active+remapped+backfill_toofull, acting [153,73,114]
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>> election
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
>> 7f6e854ba700  1 paxos.0).electionLogic(26624) init, last seen epoch 26624
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>> unique device id for md126: fallback method has no model nor serial'
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop
>> unexpected msg
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
>> "osd_memory_target"} v 0) v1
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
>> "osd_memory_target"}]: dispatch
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"} v 0) v1
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>> "osd_memory_target"}]: dispatch
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>> election
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
>> 7f6e854ba700  1 paxos.0).electionLogic(26627) init, last seen epoch 26627,
>> mid-election, bumping
>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000
>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>> unique device id for md126: fallback method has no model nor serial'
>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000
>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>> assign global_id
>>
>> osd-1 # journalctl -f -u
>> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdc463500
>> client_session(request_renewcaps seq 88463) from client.60598827 v1:
>> 172.16.59.39:0/1389838619
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0
>> client_session(request_renewcaps seq 88463) from client.60598821 v1:
>> 172.16.59.39:0/858534994
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece1e24540
>> client_session(request_renewcaps seq 88459) from client.60591845 v1:
>> 172.16.59.7:0/1705034209
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece055f340
>> client_session(request_renewcaps seq 88462) from client.60598851 v1:
>> 172.16.59.26:0/763945533
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00
>> client_session(request_renewcaps seq 88459) from client.60591994 v1:
>> 172.16.59.7:0/4158829178
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00
>> client_session(request_renewcaps seq 86286) from client.60712226 v1:
>> 172.16.59.64:0/1098377799
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ec336dc000
>> client_session(request_renewcaps seq 88463) from client.60591563 v1:
>> 172.16.59.39:0/1765846930
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0
>> client_session(request_renewcaps seq 86592) from client.60695401 v1:
>> 172.16.59.27:0/2213843285
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40
>> client_session(request_renewcaps seq 88461) from client.60599085 v1:
>> 172.16.59.19:0/1476359719
>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340
>> client_session(request_renewcaps seq 88463) from client.60591566 v1:
>> 172.16.59.39:0/3197981635
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx