After still more digging, I found the following high numbers of failed connection attempts on my osd nodes, see bottom the netstat output (nstat is also useful as it allows to reset the counters). The failed connection attempts could be too high. I found an old thread on the mailing list that recommended to enable logging of reset connections to syslog. ``` iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG ``` This was very useful and I saw a lot of failed connection attempts to port 8443, so something with the dashboard. I also noticed that there were a lot of "beast" error message which appears to be related to RGW. So I stopped everything except for the bare essentials of mds, mgr, mon, osd. The cluster appeared to stabilize after a full reboot. It is hard to judge whether this would hold on for long - the problem sometimes appeared only after several hours. Next, I deployed prometheus with `ceph orch` and everything remained OK. I then began to `ceph orch deploy rgw` for my default realm which caused no apparent problem. Once I deployed the ingress service for this with the following YAML: ``` service_type: ingress service_id: rgw.default placement: count: 6 spec: backend_service: rgw.default virtual_ip: 172.16.62.26/19 frontend_port: 443 monitor_port: 1967 ssl_cert: | -----BEGIN PRIVATE KEY----- ... ``` I began seeing **a lot** of beast debug messages as follows: ``` Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000 7f80a9398700 1 ====== req done req=0x7f81e6127620 op status=0 http_status=200 latency=0.000999998s ====== Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000 7f80a9398700 1 beast: 0x7f81e6127620: 172.16.62.11 - anonymous [03/Oct/2021:09:21:00.095 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000999998s Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f80d1be9700 1 ====== starting new request req=0x7f81e6127620 ===== Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f819d580700 0 ERROR: client_io->complete_request() returned Connection reset by peer Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f819d580700 1 ====== req done req=0x7f81e6127620 op status=0 http_status=200 latency=0.000000000s ====== Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 7f819d580700 1 beast: 0x7f81e6127620: 172.16.62.12 - anonymous [03/Oct/2021:09:21:00.568 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000000000s Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.583+0000 7f80c1bc9700 1 ====== starting new request req=0x7f81e6127620 ===== ``` and the TCP connection reset counters started to jump again (the monitors remained stable still). To me this indicates that haproxy is most probably the culprit for the high number of "connection resets received", maybe unrelated to my cluster stability. Also see the connection setting: `option httpchk HEAD / HTTP/1.0`, full haproxy and keepalived config below. This leads me to the question: - Is this normal / to be expected? I found this StackOverflow thread - https://stackoverflow.com/questions/21550337/haproxy-netty-way-to-prevent-exceptions-on-connection-reset/40005338#40005338 I now have the following `ceph orch ls` output and will now wait overnight whether things remain stable. It is my feeling that prometheus should not destabilize things and I can live with the other services being disabled for a while. ``` # ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT ingress.rgw.default 172.16.62.26:443,1967 12/12 5m ago 16m count:6 mds.cephfs 2/2 4m ago 4d count-per-host:1;label:mds mgr 5/5 5m ago 5d count:5 mon 5/5 5m ago 2d count:5 osd.unmanaged 180/180 5m ago - <unmanaged> prometheus ?:9095 2/2 3m ago 18m count:2 rgw.default ?:8000 6/6 5m ago 25m count-per-host:1;label:rgw ``` Cheers, Manuel ``` # output of netstat -s | grep -A 10 ^ Tcp: + ssh osd-1 netstat -s Tcp: 1043521 active connections openings 449583 passive connection openings 28923 failed connection attempts 310376 connection resets received 12100 connections established 389101110 segments received 590111283 segments send out 722988 segments retransmited 180 bad segments received. 260749 resets sent ``` ``` # cat /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/keepalived.rgw.default.osd-1.vrjiew/keepalived.conf # This file is generated by cephadm. vrrp_script check_backend { script "/usr/bin/curl http://localhost:1967/health" weight -20 interval 2 rise 2 fall 2 } vrrp_instance VI_0 { state MASTER priority 100 interface bond0 virtual_router_id 51 advert_int 1 authentication { auth_type PASS auth_pass qghwhcnanqsltihgtpsm } unicast_src_ip 172.16.62.10 unicast_peer { 172.16.62.11 172.16.62.12 172.16.62.13 172.16.62.30 172.16.62.31 } virtual_ipaddress { 172.16.62.26/19 dev bond0 } track_script { check_backend } # cat /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/haproxy.rgw.default.osd-1.urpnuu/haproxy/haproxy.cfg # This file is generated by cephadm. global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/lib/haproxy/haproxy.pid maxconn 8000 daemon stats socket /var/lib/haproxy/stats defaults mode http log global option httplog option dontlognull option http-server-close option forwardfor except 127.0.0.0/8 option redispatch retries 3 timeout queue 20s timeout connect 5s timeout http-request 1s timeout http-keep-alive 5s timeout client 1s timeout server 1s timeout check 5s maxconn 8000 frontend stats mode http bind *:1967 stats enable stats uri /stats stats refresh 10s stats auth admin:ivlgujuagrksajemsqyg http-request use-service prometheus-exporter if { path /metrics } monitor-uri /health frontend frontend bind *:443 ssl crt /var/lib/haproxy/haproxy.pem default_backend backend backend backend option forwardfor balance static-rr option httpchk HEAD / HTTP/1.0 server rgw.default.osd-1.xqrjwp 172.16.62.10:8000 check weight 100 server rgw.default.osd-2.lopjij 172.16.62.11:8000 check weight 100 server rgw.default.osd-3.plbqka 172.16.62.12:8000 check weight 100 server rgw.default.osd-4.jvkhen 172.16.62.13:8000 check weight 100 server rgw.default.osd-5.hjxnrb 172.16.62.30:8000 check weight 100 server rgw.default.osd-6.bdrxdd 172.16.62.31:8000 check weight 100 ``` On Sat, Oct 2, 2021 at 2:32 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote: > Dear all, > > I previously sent an email to the list regarding something that I called > "leader election" loop. The problem has reappeared several time and I don't > know how to proceed debugging or fixing this. > > I have 6 nodes osd-{1..6} and monitors are on osd-{1..5}. I run ceph > 15.2.14 using cephadm on CentOS 7.9 (kernel 3.10.0-1160.42.2.el7.x86_64). > > The symptoms are (also see my previous email). > > - `ceph -s` takes a long time or does not reeturn > --- I sometime see messages "monclient: get_monmap_and_config failed to > get config" > --- I sometimes see messages "problem getting command descriptions from > mon.osd-2" (always works with admin socket of course) > - I sometimes ee all daemons out of quorum in `ceph -s` > - different monitors go out of quorum and go back in > - leader election is reinitiated every few seconds > - the monitors appear to go correctly between "electing" and "peon" but > the issue is that leader election is performed every few seconds... > > I have done all the checks in the "troubleshooting monitors" up to the > point where it says "reach out to the community". In particular, I checked > the mon_stats and each monitor sees all others on the correct public IP and > I can telnet to 3300 and 6789 from each monitor to all others. > > I have bumped the nf_conntrack settings although I don't have any entries > in the syslog yet about dropping packages. `netstat -s` shows a few dropped > packages (e.g., 172 outgoing dropped, 18 dropped because of missing route). > > Also, I have added public servers and the cluster itself to chrony.conf > (see below). The output of `chronyc sources -v` indicates to me that the > cluster itself is in sync and clock skew is below 10 ns. > > I am able to inject the debug level 10/10 increase into the monitors, had > to repeat for one out of quroum monitor that first said "Error ENXIO: > problem getting command descriptions from mon.osd-5" but then accepted by > "ceph tell". > > I have pulled the logs for two minutes while the cluster was running its > leader election loop and attached them. They are a couple of thousand lines > each and should show the problem. I'd be happy to send fewer or more lines, > though. > > I'd be happy about any help or suggestions towards a resolution. > > Best wishes, > Manuel > > ``` > # 2>/dev/null sysctl -a | grep nf_ | egrep 'max|bucket' > net.netfilter.nf_conntrack_buckets = 2500096 > net.netfilter.nf_conntrack_expect_max = 39060 > net.netfilter.nf_conntrack_max = 10000000 > net.netfilter.nf_conntrack_tcp_max_retrans = 3 > net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 > net.nf_conntrack_max = 10000000 > ``` > > ``` > # from chrony.conf > server 172.16.35.140 iburst > server 172.16.35.141 iburst > server 172.16.35.142 iburst > server osd-1 iburst > server osd-2 iburst > server osd-3 iburst > server osd-4 iburst > server osd-5 iburst > server osd-6 iburst > server 0.de.pool.ntp.org iburst > server 1.de.pool.ntp.org iburst > server 2.de.pool.ntp.org iburst > server 3.de.pool.ntp.org iburst > ``` > > ``` > # chronyc sources -v > 210 Number of sources = 13 > > .-- Source mode '^' = server, '=' = peer, '#' = local clock. > / .- Source state '*' = current synced, '+' = combined , '-' = not > combined, > | / '?' = unreachable, 'x' = time may be in error, '~' = time too > variable. > || .- xxxx [ yyyy ] +/- > zzzz > || Reachability register (octal) -. | xxxx = adjusted > offset, > || Log2(Polling interval) --. | | yyyy = measured > offset, > || \ | | zzzz = estimated > error. > || | | \ > MS Name/IP address Stratum Poll Reach LastRx Last sample > > =============================================================================== > ^- 172.16.35.140 3 6 377 55 +213us[ +213us] +/- > 26ms > ^+ 172.16.35.141 2 6 377 63 +807us[ +807us] +/- > 12ms > ^+ 172.16.35.142 3 9 377 253 +1488us[+1488us] +/- > 7675us > ^+ osd-1 3 6 377 62 +145us[ +145us] +/- > 7413us > ^+ osd-2 2 6 377 61 -6577ns[-6577ns] +/- > 8108us > ^+ osd-3 4 6 377 50 +509us[ +509us] +/- > 6810us > ^+ osd-4 4 6 377 54 +447us[ +447us] +/- > 7231us > ^+ osd-5 3 6 377 52 +252us[ +252us] +/- > 6738us > ^+ osd-6 2 6 377 56 -13us[ -13us] +/- > 8563us > ^+ funky.f5s.de 2 8 377 207 +371us[ +371us] +/- > 24ms > ^- hetzner01.ziegenberg.at 2 10 377 445 +735us[ +685us] +/- > 32ms > ^* time1.uni-paderborn.de 1 9 377 253 -4246us[-4297us] +/- > 9089us > ^- 25000-021.cloud.services> 2 10 377 147 +832us[ +832us] +/- > 48ms > ``` > > On Wed, Sep 29, 2021 at 3:43 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> > wrote: > >> Dear all, >> >> I was a bit too optimistic in my previous email. It looks like the leader >> election loop reappeared. I could fix it by stopping the rogue mon daemon >> but I don't know how to fix it for good. >> >> I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in >> total). I have about 35 HDDs in each server and 4 SSDs. The servers have >> about 250 GB of RAM, there is no memory pressure on any daemon. I have an >> identical mirror cluster that does not have the issue (but that one does >> not have its file system mounted elsewhere and is running no rgws). I have >> migrated both clusters recently to cephadm and then from octopus to pacific. >> >> The primary cluster has problems (pulled from the cluster before >> fixing/restarting mon daemon): >> >> - `ceph -s` and other commands feel pretty sluggish >> - `ceph -s` shows inconsistent results in the "health" section and >> "services" overview >> - cephfs clients hang and after rebooting the clients, mounting is not >> possible any more >> - `ceph config dump` prints "monclient: get_monmap_and_config failed to >> get config" >> - I have a mon leader election loop shown in its journalctl output on the >> bottom. >> - the primary mds daemon says things like "skipping upkeep work because >> connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled >> message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from >> client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output >> >> I tried to reboot the client that is supposedly not reacting to cache >> pressure but that did not help either. The servers are connected to the >> same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for >> public network. I have disabled firewalld on the nodes but that did not fix >> the problem either. I suspect that "laggy monitors" are caused more >> probable on the software side than on the network side. >> >> I took down the rogue mon.osd-1 with `docker stop` and it looks like the >> problem disappears then. >> >> To summarize: I suspect the cause to be connected to the mon daemons. I >> have found that similar problems have been reported a couple of times. >> >> What is the best way forward? It seems that the general suggestion for >> such cases is to just "ceph orch redeploy mon", so I did this. >> >> Is there any way to find out the root cause to get rid of it? >> >> Best wishes, >> Manuel >> >> osd-1 # ceph -s >> cluster: >> id: 55633ec3-6c0c-4a02-990c-0f87e0f7a01f >> health: HEALTH_WARN >> 1 clients failing to respond to cache pressure >> 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 >> Low space hindering backfill (add storage if this doesn't >> resolve itself): 5 pgs backfill_toofull >> >> services: >> mon: 5 daemons, quorum (age 4h), out of quorum: osd-1, osd-2, osd-5, >> osd-4, osd-3 >> mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3, >> osd-5.jcfyqe, osd-2 >> mds: 1/1 daemons up, 1 standby >> osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs >> rgw: 12 daemons active (6 hosts, 2 zones) >> >> data: >> volumes: 1/1 healthy >> pools: 14 pools, 5322 pgs >> objects: 263.18M objects, 944 TiB >> usage: 1.4 PiB used, 639 TiB / 2.0 PiB avail >> pgs: 25576348/789544299 objects misplaced (3.239%) >> 5026 active+clean >> 291 active+remapped+backfilling >> 5 active+remapped+backfill_toofull >> >> io: >> client: 165 B/s wr, 0 op/s rd, 0 op/s wr >> recovery: 2.3 GiB/s, 652 objects/s >> >> progress: >> Global Recovery Event (53m) >> [==========================..] (remaining: 3m) >> >> osd-1 # ceph health detail >> HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons >> down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add >> storage if this doesn't resolve itself): 5 pgs backfill_toofull >> [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure >> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to >> respond to cache pressure client_id: 56229355 >> [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 >> mon.osd-3 (rank 4) addr [v2: >> 172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum) >> [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this >> doesn't resolve itself): 5 pgs backfill_toofull >> pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87] >> pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107] >> pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82] >> pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106] >> pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114] >> >> >> osd-1 # journalctl -f -u >> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service >> -- Logs begin at Wed 2021-09-29 08:52:53 CEST. -- >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"} v 0) v1 >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 >> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"}]: dispatch >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "df", "detail": "detail"} v 0) v1 >> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 >> 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? >> 172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df", >> "detail": "detail"}]: dispatch >> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000 >> 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000 >> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting >> 85 slow ops, oldest is mon_command([{prefix=config-key set, >> key=mgr/cephadm/host.osd-2}] v 0) >> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, >> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000 >> 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 173 >> slow ops, oldest is mon_command([{prefix=config-key set, >> key=mgr/cephadm/host.osd-2}] v 0) >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c >> 29405655..29406327) collect timeout, calling fresh election >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN >> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum >> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this >> doesn't resolve itself): 5 pgs backfill_toofull >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 >> clients failing to respond to cache pressure >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : >> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to >> respond to cache pressure client_id: 56229355 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons >> down, quorum osd-1,osd-2,osd-5,osd-4 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) >> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of >> quorum) >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: >> Low space hindering backfill (add storage if this doesn't resolve itself): >> 5 pgs backfill_toofull >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is >> active+remapped+backfill_toofull, acting [145,128,87] >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is >> active+remapped+backfill_toofull, acting [133,24,107] >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is >> active+remapped+backfill_toofull, acting [100,90,82] >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is >> active+remapped+backfill_toofull, acting [155,27,106] >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is >> active+remapped+backfill_toofull, acting [153,73,114] >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 >> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >> election >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 >> 7f6e854ba700 1 paxos.0).electionLogic(26610) init, last seen epoch 26610 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >> unique device id for md126: fallback method has no model nor serial' >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 >> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >> election >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 >> 7f6e854ba700 1 paxos.0).electionLogic(26613) init, last seen epoch 26613, >> mid-election, bumping >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >> unique device id for md126: fallback method has no model nor serial' >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop >> unexpected msg >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": >> "osd_memory_target"} v 0) v1 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": >> "osd_memory_target"}]: dispatch >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"} v 0) v1 >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"}]: dispatch >> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000 >> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, >> mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4) >> Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000 >> 7f6e89cc3700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> >> conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0 >> l=0).handle_client_banner accept peer addr is really - (socket is v1: >> 172.16.35.183:47888/0) >> Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000 >> 7f6e894c2700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> >> conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0 >> l=0).handle_client_banner accept peer addr is really - (socket is v1: >> 172.16.35.182:42746/0) >> Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000 >> 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 266 >> slow ops, oldest is mon_command([{prefix=config-key set, >> key=mgr/cephadm/host.osd-2}] v 0) >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c >> 29405655..29406327) collect timeout, calling fresh election >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN >> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum >> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this >> doesn't resolve itself): 5 pgs backfill_toofull >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 >> clients failing to respond to cache pressure >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : >> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to >> respond to cache pressure client_id: 56229355 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons >> down, quorum osd-1,osd-2,osd-5,osd-4 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) >> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of >> quorum) >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: >> Low space hindering backfill (add storage if this doesn't resolve itself): >> 5 pgs backfill_toofull >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is >> active+remapped+backfill_toofull, acting [145,128,87] >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is >> active+remapped+backfill_toofull, acting [133,24,107] >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is >> active+remapped+backfill_toofull, acting [100,90,82] >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is >> active+remapped+backfill_toofull, acting [155,27,106] >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is >> active+remapped+backfill_toofull, acting [153,73,114] >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 >> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >> election >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 >> 7f6e854ba700 1 paxos.0).electionLogic(26616) init, last seen epoch 26616 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >> unique device id for md126: fallback method has no model nor serial' >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop >> unexpected msg >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 >> 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? >> 172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix": >> "status", "format": "json-pretty"}]: dispatch >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": >> "osd_memory_target"} v 0) v1 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 >> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": >> "osd_memory_target"}]: dispatch >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"} v 0) v1 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 >> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"}]: dispatch >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000 >> 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "df", "detail": "detail"} v 0) v1 >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 >> 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? >> 172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df", >> "detail": "detail"}]: dispatch >> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000 >> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000 >> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting >> 289 slow ops, oldest is mon_command([{prefix=config-key set, >> key=mgr/cephadm/host.osd-2}] v 0) >> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000 >> 7f6e87cbf700 1 paxos.0).electionLogic(26617) init, last seen epoch 26617, >> mid-election, bumping >> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000 >> 7f6e87cbf700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >> unique device id for md126: fallback method has no model nor serial' >> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 >> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >> election >> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 >> 7f6e854ba700 1 paxos.0).electionLogic(26621) init, last seen epoch 26621, >> mid-election, bumping >> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >> unique device id for md126: fallback method has no model nor serial' >> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000 >> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting >> 289 slow ops, oldest is mon_command([{prefix=config-key set, >> key=mgr/cephadm/host.osd-2}] v 0) >> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, >> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000 >> 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting 423 >> slow ops, oldest is mon_command([{prefix=config-key set, >> key=mgr/cephadm/host.osd-2}] v 0) >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c >> 29405655..29406327) collect timeout, calling fresh election >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN >> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum >> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this >> doesn't resolve itself): 5 pgs backfill_toofull >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 >> clients failing to respond to cache pressure >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : >> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to >> respond to cache pressure client_id: 56229355 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons >> down, quorum osd-1,osd-2,osd-5,osd-4 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) >> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of >> quorum) >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: >> Low space hindering backfill (add storage if this doesn't resolve itself): >> 5 pgs backfill_toofull >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is >> active+remapped+backfill_toofull, acting [145,128,87] >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is >> active+remapped+backfill_toofull, acting [133,24,107] >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is >> active+remapped+backfill_toofull, acting [100,90,82] >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is >> active+remapped+backfill_toofull, acting [155,27,106] >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is >> active+remapped+backfill_toofull, acting [153,73,114] >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 >> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >> election >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 >> 7f6e854ba700 1 paxos.0).electionLogic(26624) init, last seen epoch 26624 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >> unique device id for md126: fallback method has no model nor serial' >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop >> unexpected msg >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": >> "osd_memory_target"} v 0) v1 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": >> "osd_memory_target"}]: dispatch >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"} v 0) v1 >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >> "osd_memory_target"}]: dispatch >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 >> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >> election >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 >> 7f6e854ba700 1 paxos.0).electionLogic(26627) init, last seen epoch 26627, >> mid-election, bumping >> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000 >> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >> unique device id for md126: fallback method has no model nor serial' >> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000 >> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >> assign global_id >> >> osd-1 # journalctl -f -u >> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdc463500 >> client_session(request_renewcaps seq 88463) from client.60598827 v1: >> 172.16.59.39:0/1389838619 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0 >> client_session(request_renewcaps seq 88463) from client.60598821 v1: >> 172.16.59.39:0/858534994 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece1e24540 >> client_session(request_renewcaps seq 88459) from client.60591845 v1: >> 172.16.59.7:0/1705034209 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece055f340 >> client_session(request_renewcaps seq 88462) from client.60598851 v1: >> 172.16.59.26:0/763945533 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00 >> client_session(request_renewcaps seq 88459) from client.60591994 v1: >> 172.16.59.7:0/4158829178 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00 >> client_session(request_renewcaps seq 86286) from client.60712226 v1: >> 172.16.59.64:0/1098377799 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ec336dc000 >> client_session(request_renewcaps seq 88463) from client.60591563 v1: >> 172.16.59.39:0/1765846930 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0 >> client_session(request_renewcaps seq 86592) from client.60695401 v1: >> 172.16.59.27:0/2213843285 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40 >> client_session(request_renewcaps seq 88461) from client.60599085 v1: >> 172.16.59.19:0/1476359719 >> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340 >> client_session(request_renewcaps seq 88463) from client.60591566 v1: >> 172.16.59.39:0/3197981635 >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx