Re: Leader election loop reappears

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear all,

again answering my own emails... It turns out that the connection resets
are not problematic. I took the liberty to document this here in the
tracker in the hope that users with similar issues can find my results.

https://tracker.ceph.com/issues/52825

As for my original issue: As described earlier, I have gone and `ceph orch
rm`ed all but the bare essentials. I then continued adding back daemon by
daemon and now also have the dashboard running and my second rgw daemon.
After enabling each, I have waited overnight to see whether the problem
reappeared - so far everything works fine. I don't know what caused the
problem and I will do more debugging once the problem reappears.

As for the problem with two ingests and two rgws, I have opened a ticket
here.

https://tracker.ceph.com/issues/52826

Cheers,
Manuel

On Sun, Oct 3, 2021 at 11:47 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
wrote:

> After still more digging, I found the following high numbers of failed
> connection attempts on my osd nodes, see bottom the netstat output (nstat
> is also useful as it allows to reset the counters). The failed connection
> attempts could be too high. I found an old thread on the mailing list that
> recommended to enable logging of reset connections to syslog.
>
> ```
> iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG
> ```
>
> This was very useful and I saw a lot of failed connection attempts to port
> 8443, so something with the dashboard. I also noticed that there were a lot
> of "beast" error message which appears to be related to RGW. So I stopped
> everything except for the bare essentials of mds, mgr, mon, osd.
>
> The cluster appeared to stabilize after a full reboot. It is hard to judge
> whether this would hold on for long - the problem sometimes appeared only
> after several hours. Next, I deployed prometheus with `ceph orch` and
> everything remained OK.
>
> I then began to `ceph orch deploy rgw` for my default realm which caused
> no apparent problem. Once I deployed the ingress service for this with the
> following YAML:
>
> ```
> service_type: ingress
> service_id: rgw.default
> placement:
>   count: 6
> spec:
>   backend_service: rgw.default
>   virtual_ip: 172.16.62.26/19
>   frontend_port: 443
>   monitor_port: 1967
>   ssl_cert: |
>     -----BEGIN PRIVATE KEY-----
>     ...
> ```
>
> I began seeing **a lot** of beast debug messages as follows:
>
> ```
> Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000
> 7f80a9398700  1 ====== req done req=0x7f81e6127620 op status=0
> http_status=200 latency=0.000999998s ======
> Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000
> 7f80a9398700  1 beast: 0x7f81e6127620: 172.16.62.11 - anonymous
> [03/Oct/2021:09:21:00.095 +0000] "HEAD / HTTP/1.0" 200 0 - - -
> latency=0.000999998s
> Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000
> 7f80d1be9700  1 ====== starting new request req=0x7f81e6127620 =====
> Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000
> 7f819d580700  0 ERROR: client_io->complete_request() returned Connection
> reset by peer
> Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000
> 7f819d580700  1 ====== req done req=0x7f81e6127620 op status=0
> http_status=200 latency=0.000000000s ======
> Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000
> 7f819d580700  1 beast: 0x7f81e6127620: 172.16.62.12 - anonymous
> [03/Oct/2021:09:21:00.568 +0000] "HEAD / HTTP/1.0" 200 0 - - -
> latency=0.000000000s
> Oct  3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.583+0000
> 7f80c1bc9700  1 ====== starting new request req=0x7f81e6127620 =====
> ```
>
> and the TCP connection reset counters started to jump again (the monitors
> remained stable still). To me this indicates that haproxy is most probably
> the culprit for the high number of "connection resets received", maybe
> unrelated to my cluster stability. Also see the connection setting: `option
> httpchk HEAD / HTTP/1.0`, full haproxy and keepalived config below.
>
> This leads me to the question:
>
> - Is this normal / to be expected?
>
> I found this StackOverflow thread
>
> -
> https://stackoverflow.com/questions/21550337/haproxy-netty-way-to-prevent-exceptions-on-connection-reset/40005338#40005338
>
> I now have the following `ceph orch ls` output and will now wait overnight
> whether things remain stable. It is my feeling that prometheus should not
> destabilize things and I can live with the other services being disabled
> for a while.
>
> ```
> # ceph orch ls
> NAME                 PORTS                  RUNNING  REFRESHED  AGE
>  PLACEMENT
> ingress.rgw.default  172.16.62.26:443,1967    12/12  5m ago     16m
>  count:6
> mds.cephfs                                      2/2  4m ago     4d
> count-per-host:1;label:mds
> mgr                                             5/5  5m ago     5d
> count:5
> mon                                             5/5  5m ago     2d
> count:5
> osd.unmanaged                               180/180  5m ago     -
>  <unmanaged>
> prometheus           ?:9095                     2/2  3m ago     18m
>  count:2
> rgw.default          ?:8000                     6/6  5m ago     25m
>  count-per-host:1;label:rgw
> ```
>
> Cheers,
> Manuel
>
> ```
> # output of netstat -s | grep -A 10 ^ Tcp:
> + ssh osd-1 netstat -s
> Tcp:
>     1043521 active connections openings
>     449583 passive connection openings
>     28923 failed connection attempts
>     310376 connection resets received
>     12100 connections established
>     389101110 segments received
>     590111283 segments send out
>     722988 segments retransmited
>     180 bad segments received.
>     260749 resets sent
> ```
>
> ```
> # cat
> /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/keepalived.rgw.default.osd-1.vrjiew/keepalived.conf
> # This file is generated by cephadm.
> vrrp_script check_backend {
>     script "/usr/bin/curl http://localhost:1967/health";
>     weight -20
>     interval 2
>     rise 2
>     fall 2
> }
>
> vrrp_instance VI_0 {
>   state MASTER
>   priority 100
>   interface bond0
>   virtual_router_id 51
>   advert_int 1
>   authentication {
>       auth_type PASS
>       auth_pass qghwhcnanqsltihgtpsm
>   }
>   unicast_src_ip 172.16.62.10
>   unicast_peer {
>     172.16.62.11
>     172.16.62.12
>     172.16.62.13
>     172.16.62.30
>     172.16.62.31
>   }
>   virtual_ipaddress {
>     172.16.62.26/19 dev bond0
>   }
>   track_script {
>       check_backend
>   }
> # cat
> /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/haproxy.rgw.default.osd-1.urpnuu/haproxy/haproxy.cfg
> # This file is generated by cephadm.
> global
>     log         127.0.0.1 local2
>     chroot      /var/lib/haproxy
>     pidfile     /var/lib/haproxy/haproxy.pid
>     maxconn     8000
>     daemon
>     stats socket /var/lib/haproxy/stats
>
> defaults
>     mode                    http
>     log                     global
>     option                  httplog
>     option                  dontlognull
>     option http-server-close
>     option forwardfor       except 127.0.0.0/8
>     option                  redispatch
>     retries                 3
>     timeout queue           20s
>     timeout connect         5s
>     timeout http-request    1s
>     timeout http-keep-alive 5s
>     timeout client          1s
>     timeout server          1s
>     timeout check           5s
>     maxconn                 8000
>
> frontend stats
>     mode http
>     bind *:1967
>     stats enable
>     stats uri /stats
>     stats refresh 10s
>     stats auth admin:ivlgujuagrksajemsqyg
>     http-request use-service prometheus-exporter if { path /metrics }
>     monitor-uri /health
>
> frontend frontend
>     bind *:443 ssl crt /var/lib/haproxy/haproxy.pem
>     default_backend backend
>
> backend backend
>     option forwardfor
>     balance static-rr
>     option httpchk HEAD / HTTP/1.0
>     server rgw.default.osd-1.xqrjwp 172.16.62.10:8000 check weight 100
>     server rgw.default.osd-2.lopjij 172.16.62.11:8000 check weight 100
>     server rgw.default.osd-3.plbqka 172.16.62.12:8000 check weight 100
>     server rgw.default.osd-4.jvkhen 172.16.62.13:8000 check weight 100
>     server rgw.default.osd-5.hjxnrb 172.16.62.30:8000 check weight 100
>     server rgw.default.osd-6.bdrxdd 172.16.62.31:8000 check weight 100
> ```
>
> On Sat, Oct 2, 2021 at 2:32 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
> wrote:
>
>> Dear all,
>>
>> I previously sent an email to the list regarding something that I called
>> "leader election" loop. The problem has reappeared several time and I don't
>> know how to proceed debugging or fixing this.
>>
>> I have 6 nodes osd-{1..6} and monitors are on osd-{1..5}. I run ceph
>> 15.2.14 using cephadm on CentOS 7.9 (kernel 3.10.0-1160.42.2.el7.x86_64).
>>
>> The symptoms are (also see my previous email).
>>
>> - `ceph -s` takes a long time or does not reeturn
>> --- I sometime see messages "monclient: get_monmap_and_config failed to
>> get config"
>> --- I sometimes see messages "problem getting command descriptions from
>> mon.osd-2" (always works with admin socket of course)
>> - I sometimes ee all daemons out of quorum in `ceph -s`
>> - different monitors go out of quorum and go back in
>> - leader election is reinitiated every few seconds
>> - the monitors appear to go correctly between "electing" and "peon" but
>> the issue is that leader election is performed every few seconds...
>>
>> I have done all the checks in the "troubleshooting monitors" up to the
>> point where it says "reach out to the community". In particular, I checked
>> the mon_stats and each monitor sees all others on the correct public IP and
>> I can telnet to 3300 and 6789 from each monitor to all others.
>>
>> I have bumped the nf_conntrack settings although I don't have any entries
>> in the syslog yet about dropping packages. `netstat -s` shows a few dropped
>> packages (e.g., 172 outgoing dropped, 18 dropped because of missing route).
>>
>> Also, I have added public servers and the cluster itself to chrony.conf
>> (see below). The output of `chronyc sources -v` indicates to me that the
>> cluster itself is in sync and clock skew is below 10 ns.
>>
>> I am able to inject the debug level 10/10 increase into the monitors, had
>> to repeat for one out of quroum monitor that first said "Error ENXIO:
>> problem getting command descriptions from mon.osd-5" but then accepted by
>> "ceph tell".
>>
>> I have pulled the logs for two minutes while the cluster was running its
>> leader election loop and attached them. They are a couple of thousand lines
>> each and should show the problem. I'd be happy to send fewer or more lines,
>> though.
>>
>> I'd be happy about any help or suggestions towards a resolution.
>>
>> Best wishes,
>> Manuel
>>
>> ```
>> # 2>/dev/null sysctl -a | grep nf_ | egrep 'max|bucket'
>> net.netfilter.nf_conntrack_buckets = 2500096
>> net.netfilter.nf_conntrack_expect_max = 39060
>> net.netfilter.nf_conntrack_max = 10000000
>> net.netfilter.nf_conntrack_tcp_max_retrans = 3
>> net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
>> net.nf_conntrack_max = 10000000
>> ```
>>
>> ```
>> # from chrony.conf
>> server 172.16.35.140 iburst
>> server 172.16.35.141 iburst
>> server 172.16.35.142 iburst
>> server osd-1 iburst
>> server osd-2 iburst
>> server osd-3 iburst
>> server osd-4 iburst
>> server osd-5 iburst
>> server osd-6 iburst
>> server 0.de.pool.ntp.org iburst
>> server 1.de.pool.ntp.org iburst
>> server 2.de.pool.ntp.org iburst
>> server 3.de.pool.ntp.org iburst
>> ```
>>
>> ```
>> # chronyc sources -v
>> 210 Number of sources = 13
>>
>>   .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
>>  / .- Source state '*' = current synced, '+' = combined , '-' = not
>> combined,
>> | /   '?' = unreachable, 'x' = time may be in error, '~' = time too
>> variable.
>> ||                                                 .- xxxx [ yyyy ] +/-
>> zzzz
>> ||      Reachability register (octal) -.           |  xxxx = adjusted
>> offset,
>> ||      Log2(Polling interval) --.      |          |  yyyy = measured
>> offset,
>> ||                                \     |          |  zzzz = estimated
>> error.
>> ||                                 |    |           \
>> MS Name/IP address         Stratum Poll Reach LastRx Last sample
>>
>> ===============================================================================
>> ^- 172.16.35.140                 3   6   377    55   +213us[ +213us] +/-
>>   26ms
>> ^+ 172.16.35.141                 2   6   377    63   +807us[ +807us] +/-
>>   12ms
>> ^+ 172.16.35.142                 3   9   377   253  +1488us[+1488us] +/-
>> 7675us
>> ^+ osd-1                         3   6   377    62   +145us[ +145us] +/-
>> 7413us
>> ^+ osd-2                         2   6   377    61  -6577ns[-6577ns] +/-
>> 8108us
>> ^+ osd-3                         4   6   377    50   +509us[ +509us] +/-
>> 6810us
>> ^+ osd-4                         4   6   377    54   +447us[ +447us] +/-
>> 7231us
>> ^+ osd-5                         3   6   377    52   +252us[ +252us] +/-
>> 6738us
>> ^+ osd-6                         2   6   377    56    -13us[  -13us] +/-
>> 8563us
>> ^+ funky.f5s.de                  2   8   377   207   +371us[ +371us] +/-
>>   24ms
>> ^- hetzner01.ziegenberg.at       2  10   377   445   +735us[ +685us] +/-
>>   32ms
>> ^* time1.uni-paderborn.de        1   9   377   253  -4246us[-4297us] +/-
>> 9089us
>> ^- 25000-021.cloud.services>     2  10   377   147   +832us[ +832us] +/-
>>   48ms
>> ```
>>
>> On Wed, Sep 29, 2021 at 3:43 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx>
>> wrote:
>>
>>> Dear all,
>>>
>>> I was a bit too optimistic in my previous email. It looks like the
>>> leader election loop reappeared. I could fix it by stopping the rogue mon
>>> daemon but I don't know how to fix it for good.
>>>
>>> I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in
>>> total). I have about 35 HDDs in each server and 4 SSDs. The servers have
>>> about 250 GB of RAM, there is no memory pressure on any daemon. I have an
>>> identical mirror cluster that does not have the issue (but that one does
>>> not have its file system mounted elsewhere and is running no rgws). I have
>>> migrated both clusters recently to cephadm and then from octopus to pacific.
>>>
>>> The primary cluster has problems (pulled from the cluster before
>>> fixing/restarting mon daemon):
>>>
>>> - `ceph -s` and other commands feel pretty sluggish
>>> - `ceph -s` shows inconsistent results in the "health" section and
>>> "services" overview
>>> - cephfs clients hang and after rebooting the clients, mounting is not
>>> possible any more
>>> - `ceph config dump` prints "monclient: get_monmap_and_config failed to
>>> get config"
>>> - I have a mon leader election loop shown in its journalctl output on
>>> the bottom.
>>> - the primary mds daemon says things like "skipping upkeep work because
>>> connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled
>>> message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from
>>> client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output
>>>
>>> I tried to reboot the client that is supposedly not reacting to cache
>>> pressure but that did not help either. The servers are connected to the
>>> same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for
>>> public network. I have disabled firewalld on the nodes but that did not fix
>>> the problem either. I suspect that "laggy monitors" are caused more
>>> probable on the software side than on the network side.
>>>
>>> I took down the rogue mon.osd-1 with `docker stop` and it looks like the
>>> problem disappears then.
>>>
>>> To summarize: I suspect the cause to be connected to the mon daemons. I
>>> have found that similar problems have been reported a couple of times.
>>>
>>> What is the best way forward? It seems that the general suggestion for
>>> such cases is to just "ceph orch redeploy mon", so I did this.
>>>
>>> Is there any way to find out the root cause to get rid of it?
>>>
>>> Best wishes,
>>> Manuel
>>>
>>> osd-1 # ceph -s
>>>   cluster:
>>>     id:     55633ec3-6c0c-4a02-990c-0f87e0f7a01f
>>>     health: HEALTH_WARN
>>>             1 clients failing to respond to cache pressure
>>>             1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
>>>             Low space hindering backfill (add storage if this doesn't
>>> resolve itself): 5 pgs backfill_toofull
>>>
>>>   services:
>>>     mon: 5 daemons, quorum  (age 4h), out of quorum: osd-1, osd-2,
>>> osd-5, osd-4, osd-3
>>>     mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3,
>>> osd-5.jcfyqe, osd-2
>>>     mds: 1/1 daemons up, 1 standby
>>>     osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs
>>>     rgw: 12 daemons active (6 hosts, 2 zones)
>>>
>>>   data:
>>>     volumes: 1/1 healthy
>>>     pools:   14 pools, 5322 pgs
>>>     objects: 263.18M objects, 944 TiB
>>>     usage:   1.4 PiB used, 639 TiB / 2.0 PiB avail
>>>     pgs:     25576348/789544299 objects misplaced (3.239%)
>>>              5026 active+clean
>>>              291  active+remapped+backfilling
>>>              5    active+remapped+backfill_toofull
>>>
>>>   io:
>>>     client:   165 B/s wr, 0 op/s rd, 0 op/s wr
>>>     recovery: 2.3 GiB/s, 652 objects/s
>>>
>>>   progress:
>>>     Global Recovery Event (53m)
>>>       [==========================..] (remaining: 3m)
>>>
>>> osd-1 # ceph health detail
>>> HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons
>>> down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add
>>> storage if this doesn't resolve itself): 5 pgs backfill_toofull
>>> [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
>>>     mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing
>>> to respond to cache pressure client_id: 56229355
>>> [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4
>>>     mon.osd-3 (rank 4) addr [v2:
>>> 172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum)
>>> [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if
>>> this doesn't resolve itself): 5 pgs backfill_toofull
>>>     pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87]
>>>     pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107]
>>>     pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82]
>>>     pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106]
>>>     pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114]
>>>
>>>
>>> osd-1 # journalctl -f -u
>>> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service
>>> -- Logs begin at Wed 2021-09-29 08:52:53 CEST. --
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"} v 0) v1
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000
>>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"}]: dispatch
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "df", "detail": "detail"} v 0) v1
>>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000
>>> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
>>> 172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df",
>>> "detail": "detail"}]: dispatch
>>> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000
>>> 7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000
>>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting
>>> 85 slow ops, oldest is mon_command([{prefix=config-key set,
>>> key=mgr/cephadm/host.osd-2}] v 0)
>>> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
>>> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000
>>> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting
>>> 173 slow ops, oldest is mon_command([{prefix=config-key set,
>>> key=mgr/cephadm/host.osd-2}] v 0)
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
>>> 29405655..29406327) collect timeout, calling fresh election
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
>>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
>>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
>>> doesn't resolve itself): 5 pgs backfill_toofull
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
>>> clients failing to respond to cache pressure
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
>>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
>>> respond to cache pressure client_id: 56229355
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
>>> down, quorum osd-1,osd-2,osd-5,osd-4
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
>>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
>>> quorum)
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
>>> Low space hindering backfill (add storage if this doesn't resolve itself):
>>> 5 pgs backfill_toofull
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
>>> active+remapped+backfill_toofull, acting [145,128,87]
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
>>> active+remapped+backfill_toofull, acting [133,24,107]
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
>>> active+remapped+backfill_toofull, acting [100,90,82]
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
>>> active+remapped+backfill_toofull, acting [155,27,106]
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
>>> active+remapped+backfill_toofull, acting [153,73,114]
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
>>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>>> election
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000
>>> 7f6e854ba700  1 paxos.0).electionLogic(26610) init, last seen epoch 26610
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>>> unique device id for md126: fallback method has no model nor serial'
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
>>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>>> election
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000
>>> 7f6e854ba700  1 paxos.0).electionLogic(26613) init, last seen epoch 26613,
>>> mid-election, bumping
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>>> unique device id for md126: fallback method has no model nor serial'
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop
>>> unexpected msg
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
>>> "osd_memory_target"} v 0) v1
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
>>> "osd_memory_target"}]: dispatch
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"} v 0) v1
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000
>>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"}]: dispatch
>>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000
>>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
>>> mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4)
>>> Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000
>>> 7f6e89cc3700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
>>>  conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0
>>> l=0).handle_client_banner accept peer addr is really - (socket is v1:
>>> 172.16.35.183:47888/0)
>>> Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000
>>> 7f6e894c2700  0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >>
>>>  conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0
>>> l=0).handle_client_banner accept peer addr is really - (socket is v1:
>>> 172.16.35.182:42746/0)
>>> Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000
>>> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting
>>> 266 slow ops, oldest is mon_command([{prefix=config-key set,
>>> key=mgr/cephadm/host.osd-2}] v 0)
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
>>> 29405655..29406327) collect timeout, calling fresh election
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
>>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
>>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
>>> doesn't resolve itself): 5 pgs backfill_toofull
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
>>> clients failing to respond to cache pressure
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
>>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
>>> respond to cache pressure client_id: 56229355
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
>>> down, quorum osd-1,osd-2,osd-5,osd-4
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
>>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
>>> quorum)
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
>>> Low space hindering backfill (add storage if this doesn't resolve itself):
>>> 5 pgs backfill_toofull
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
>>> active+remapped+backfill_toofull, acting [145,128,87]
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
>>> active+remapped+backfill_toofull, acting [133,24,107]
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
>>> active+remapped+backfill_toofull, acting [100,90,82]
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
>>> active+remapped+backfill_toofull, acting [155,27,106]
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
>>> active+remapped+backfill_toofull, acting [153,73,114]
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
>>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>>> election
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000
>>> 7f6e854ba700  1 paxos.0).electionLogic(26616) init, last seen epoch 26616
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>>> unique device id for md126: fallback method has no model nor serial'
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop
>>> unexpected msg
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000
>>> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
>>> 172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix":
>>> "status", "format": "json-pretty"}]: dispatch
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
>>> "osd_memory_target"} v 0) v1
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
>>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
>>> "osd_memory_target"}]: dispatch
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"} v 0) v1
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000
>>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"}]: dispatch
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000
>>> 7f6e89cc3700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "df", "detail": "detail"} v 0) v1
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000
>>> 7f6e854ba700  0 log_channel(audit) log [DBG] : from='client.?
>>> 172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df",
>>> "detail": "detail"}]: dispatch
>>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000
>>> 7f6e894c2700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000
>>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting
>>> 289 slow ops, oldest is mon_command([{prefix=config-key set,
>>> key=mgr/cephadm/host.osd-2}] v 0)
>>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000
>>> 7f6e87cbf700  1 paxos.0).electionLogic(26617) init, last seen epoch 26617,
>>> mid-election, bumping
>>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000
>>> 7f6e87cbf700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>>> unique device id for md126: fallback method has no model nor serial'
>>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
>>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>>> election
>>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000
>>> 7f6e854ba700  1 paxos.0).electionLogic(26621) init, last seen epoch 26621,
>>> mid-election, bumping
>>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>>> unique device id for md126: fallback method has no model nor serial'
>>> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000
>>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting
>>> 289 slow ops, oldest is mon_command([{prefix=config-key set,
>>> key=mgr/cephadm/host.osd-2}] v 0)
>>> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [INF] : mon.osd-1 is new leader,
>>> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4)
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000
>>> 7f6e854ba700  0 mon.osd-1@0(leader) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting
>>> 423 slow ops, oldest is mon_command([{prefix=config-key set,
>>> key=mgr/cephadm/host.osd-2}] v 0)
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  1 mon.osd-1@0(leader).paxos(paxos recovering c
>>> 29405655..29406327) collect timeout, calling fresh election
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN
>>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum
>>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this
>>> doesn't resolve itself): 5 pgs backfill_toofull
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1
>>> clients failing to respond to cache pressure
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :
>>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to
>>> respond to cache pressure client_id: 56229355
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons
>>> down, quorum osd-1,osd-2,osd-5,osd-4
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     mon.osd-3 (rank 4)
>>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of
>>> quorum)
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL:
>>> Low space hindering backfill (add storage if this doesn't resolve itself):
>>> 5 pgs backfill_toofull
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.23d is
>>> active+remapped+backfill_toofull, acting [145,128,87]
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.33f is
>>> active+remapped+backfill_toofull, acting [133,24,107]
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3cb is
>>> active+remapped+backfill_toofull, acting [100,90,82]
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.3fc is
>>> active+remapped+backfill_toofull, acting [155,27,106]
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000
>>> 7f6e87cbf700  0 log_channel(cluster) log [WRN] :     pg 3.665 is
>>> active+remapped+backfill_toofull, acting [153,73,114]
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
>>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>>> election
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000
>>> 7f6e854ba700  1 paxos.0).electionLogic(26624) init, last seen epoch 26624
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>>> unique device id for md126: fallback method has no model nor serial'
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 handle_timecheck drop
>>> unexpected msg
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name":
>>> "osd_memory_target"} v 0) v1
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name":
>>> "osd_memory_target"}]: dispatch
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>>> 7f6e854ba700  0 mon.osd-1@0(electing) e11 handle_command
>>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"} v 0) v1
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000
>>> 7f6e854ba700  0 log_channel(audit) log [INF] : from='mgr.66351528 '
>>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name":
>>> "osd_memory_target"}]: dispatch
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
>>> 7f6e854ba700  0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor
>>> election
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000
>>> 7f6e854ba700  1 paxos.0).electionLogic(26627) init, last seen epoch 26627,
>>> mid-election, bumping
>>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000
>>> 7f6e854ba700  1 mon.osd-1@0(electing) e11 collect_metadata md126:  no
>>> unique device id for md126: fallback method has no model nor serial'
>>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000
>>> 7f6e844b8700  1 mon.osd-1@0(electing) e11 handle_auth_request failed to
>>> assign global_id
>>>
>>> osd-1 # journalctl -f -u
>>> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdc463500
>>> client_session(request_renewcaps seq 88463) from client.60598827 v1:
>>> 172.16.59.39:0/1389838619
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0
>>> client_session(request_renewcaps seq 88463) from client.60598821 v1:
>>> 172.16.59.39:0/858534994
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece1e24540
>>> client_session(request_renewcaps seq 88459) from client.60591845 v1:
>>> 172.16.59.7:0/1705034209
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ece055f340
>>> client_session(request_renewcaps seq 88462) from client.60598851 v1:
>>> 172.16.59.26:0/763945533
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00
>>> client_session(request_renewcaps seq 88459) from client.60591994 v1:
>>> 172.16.59.7:0/4158829178
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00
>>> client_session(request_renewcaps seq 86286) from client.60712226 v1:
>>> 172.16.59.64:0/1098377799
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ec336dc000
>>> client_session(request_renewcaps seq 88463) from client.60591563 v1:
>>> 172.16.59.39:0/1765846930
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0
>>> client_session(request_renewcaps seq 86592) from client.60695401 v1:
>>> 172.16.59.27:0/2213843285
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40
>>> client_session(request_renewcaps seq 88461) from client.60599085 v1:
>>> 172.16.59.19:0/1476359719
>>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000
>>> 7f994ec61700  0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340
>>> client_session(request_renewcaps seq 88463) from client.60591566 v1:
>>> 172.16.59.39:0/3197981635
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux