Dear all, again answering my own emails... It turns out that the connection resets are not problematic. I took the liberty to document this here in the tracker in the hope that users with similar issues can find my results. https://tracker.ceph.com/issues/52825 As for my original issue: As described earlier, I have gone and `ceph orch rm`ed all but the bare essentials. I then continued adding back daemon by daemon and now also have the dashboard running and my second rgw daemon. After enabling each, I have waited overnight to see whether the problem reappeared - so far everything works fine. I don't know what caused the problem and I will do more debugging once the problem reappears. As for the problem with two ingests and two rgws, I have opened a ticket here. https://tracker.ceph.com/issues/52826 Cheers, Manuel On Sun, Oct 3, 2021 at 11:47 AM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> wrote: > After still more digging, I found the following high numbers of failed > connection attempts on my osd nodes, see bottom the netstat output (nstat > is also useful as it allows to reset the counters). The failed connection > attempts could be too high. I found an old thread on the mailing list that > recommended to enable logging of reset connections to syslog. > > ``` > iptables -I INPUT -p tcp -m tcp --tcp-flags RST RST -j LOG > ``` > > This was very useful and I saw a lot of failed connection attempts to port > 8443, so something with the dashboard. I also noticed that there were a lot > of "beast" error message which appears to be related to RGW. So I stopped > everything except for the bare essentials of mds, mgr, mon, osd. > > The cluster appeared to stabilize after a full reboot. It is hard to judge > whether this would hold on for long - the problem sometimes appeared only > after several hours. Next, I deployed prometheus with `ceph orch` and > everything remained OK. > > I then began to `ceph orch deploy rgw` for my default realm which caused > no apparent problem. Once I deployed the ingress service for this with the > following YAML: > > ``` > service_type: ingress > service_id: rgw.default > placement: > count: 6 > spec: > backend_service: rgw.default > virtual_ip: 172.16.62.26/19 > frontend_port: 443 > monitor_port: 1967 > ssl_cert: | > -----BEGIN PRIVATE KEY----- > ... > ``` > > I began seeing **a lot** of beast debug messages as follows: > > ``` > Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000 > 7f80a9398700 1 ====== req done req=0x7f81e6127620 op status=0 > http_status=200 latency=0.000999998s ====== > Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.096+0000 > 7f80a9398700 1 beast: 0x7f81e6127620: 172.16.62.11 - anonymous > [03/Oct/2021:09:21:00.095 +0000] "HEAD / HTTP/1.0" 200 0 - - - > latency=0.000999998s > Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 > 7f80d1be9700 1 ====== starting new request req=0x7f81e6127620 ===== > Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 > 7f819d580700 0 ERROR: client_io->complete_request() returned Connection > reset by peer > Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 > 7f819d580700 1 ====== req done req=0x7f81e6127620 op status=0 > http_status=200 latency=0.000000000s ====== > Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.568+0000 > 7f819d580700 1 beast: 0x7f81e6127620: 172.16.62.12 - anonymous > [03/Oct/2021:09:21:00.568 +0000] "HEAD / HTTP/1.0" 200 0 - - - > latency=0.000000000s > Oct 3 11:21:00 osd-6 bash: debug 2021-10-03T09:21:00.583+0000 > 7f80c1bc9700 1 ====== starting new request req=0x7f81e6127620 ===== > ``` > > and the TCP connection reset counters started to jump again (the monitors > remained stable still). To me this indicates that haproxy is most probably > the culprit for the high number of "connection resets received", maybe > unrelated to my cluster stability. Also see the connection setting: `option > httpchk HEAD / HTTP/1.0`, full haproxy and keepalived config below. > > This leads me to the question: > > - Is this normal / to be expected? > > I found this StackOverflow thread > > - > https://stackoverflow.com/questions/21550337/haproxy-netty-way-to-prevent-exceptions-on-connection-reset/40005338#40005338 > > I now have the following `ceph orch ls` output and will now wait overnight > whether things remain stable. It is my feeling that prometheus should not > destabilize things and I can live with the other services being disabled > for a while. > > ``` > # ceph orch ls > NAME PORTS RUNNING REFRESHED AGE > PLACEMENT > ingress.rgw.default 172.16.62.26:443,1967 12/12 5m ago 16m > count:6 > mds.cephfs 2/2 4m ago 4d > count-per-host:1;label:mds > mgr 5/5 5m ago 5d > count:5 > mon 5/5 5m ago 2d > count:5 > osd.unmanaged 180/180 5m ago - > <unmanaged> > prometheus ?:9095 2/2 3m ago 18m > count:2 > rgw.default ?:8000 6/6 5m ago 25m > count-per-host:1;label:rgw > ``` > > Cheers, > Manuel > > ``` > # output of netstat -s | grep -A 10 ^ Tcp: > + ssh osd-1 netstat -s > Tcp: > 1043521 active connections openings > 449583 passive connection openings > 28923 failed connection attempts > 310376 connection resets received > 12100 connections established > 389101110 segments received > 590111283 segments send out > 722988 segments retransmited > 180 bad segments received. > 260749 resets sent > ``` > > ``` > # cat > /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/keepalived.rgw.default.osd-1.vrjiew/keepalived.conf > # This file is generated by cephadm. > vrrp_script check_backend { > script "/usr/bin/curl http://localhost:1967/health" > weight -20 > interval 2 > rise 2 > fall 2 > } > > vrrp_instance VI_0 { > state MASTER > priority 100 > interface bond0 > virtual_router_id 51 > advert_int 1 > authentication { > auth_type PASS > auth_pass qghwhcnanqsltihgtpsm > } > unicast_src_ip 172.16.62.10 > unicast_peer { > 172.16.62.11 > 172.16.62.12 > 172.16.62.13 > 172.16.62.30 > 172.16.62.31 > } > virtual_ipaddress { > 172.16.62.26/19 dev bond0 > } > track_script { > check_backend > } > # cat > /var/lib/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f/haproxy.rgw.default.osd-1.urpnuu/haproxy/haproxy.cfg > # This file is generated by cephadm. > global > log 127.0.0.1 local2 > chroot /var/lib/haproxy > pidfile /var/lib/haproxy/haproxy.pid > maxconn 8000 > daemon > stats socket /var/lib/haproxy/stats > > defaults > mode http > log global > option httplog > option dontlognull > option http-server-close > option forwardfor except 127.0.0.0/8 > option redispatch > retries 3 > timeout queue 20s > timeout connect 5s > timeout http-request 1s > timeout http-keep-alive 5s > timeout client 1s > timeout server 1s > timeout check 5s > maxconn 8000 > > frontend stats > mode http > bind *:1967 > stats enable > stats uri /stats > stats refresh 10s > stats auth admin:ivlgujuagrksajemsqyg > http-request use-service prometheus-exporter if { path /metrics } > monitor-uri /health > > frontend frontend > bind *:443 ssl crt /var/lib/haproxy/haproxy.pem > default_backend backend > > backend backend > option forwardfor > balance static-rr > option httpchk HEAD / HTTP/1.0 > server rgw.default.osd-1.xqrjwp 172.16.62.10:8000 check weight 100 > server rgw.default.osd-2.lopjij 172.16.62.11:8000 check weight 100 > server rgw.default.osd-3.plbqka 172.16.62.12:8000 check weight 100 > server rgw.default.osd-4.jvkhen 172.16.62.13:8000 check weight 100 > server rgw.default.osd-5.hjxnrb 172.16.62.30:8000 check weight 100 > server rgw.default.osd-6.bdrxdd 172.16.62.31:8000 check weight 100 > ``` > > On Sat, Oct 2, 2021 at 2:32 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> > wrote: > >> Dear all, >> >> I previously sent an email to the list regarding something that I called >> "leader election" loop. The problem has reappeared several time and I don't >> know how to proceed debugging or fixing this. >> >> I have 6 nodes osd-{1..6} and monitors are on osd-{1..5}. I run ceph >> 15.2.14 using cephadm on CentOS 7.9 (kernel 3.10.0-1160.42.2.el7.x86_64). >> >> The symptoms are (also see my previous email). >> >> - `ceph -s` takes a long time or does not reeturn >> --- I sometime see messages "monclient: get_monmap_and_config failed to >> get config" >> --- I sometimes see messages "problem getting command descriptions from >> mon.osd-2" (always works with admin socket of course) >> - I sometimes ee all daemons out of quorum in `ceph -s` >> - different monitors go out of quorum and go back in >> - leader election is reinitiated every few seconds >> - the monitors appear to go correctly between "electing" and "peon" but >> the issue is that leader election is performed every few seconds... >> >> I have done all the checks in the "troubleshooting monitors" up to the >> point where it says "reach out to the community". In particular, I checked >> the mon_stats and each monitor sees all others on the correct public IP and >> I can telnet to 3300 and 6789 from each monitor to all others. >> >> I have bumped the nf_conntrack settings although I don't have any entries >> in the syslog yet about dropping packages. `netstat -s` shows a few dropped >> packages (e.g., 172 outgoing dropped, 18 dropped because of missing route). >> >> Also, I have added public servers and the cluster itself to chrony.conf >> (see below). The output of `chronyc sources -v` indicates to me that the >> cluster itself is in sync and clock skew is below 10 ns. >> >> I am able to inject the debug level 10/10 increase into the monitors, had >> to repeat for one out of quroum monitor that first said "Error ENXIO: >> problem getting command descriptions from mon.osd-5" but then accepted by >> "ceph tell". >> >> I have pulled the logs for two minutes while the cluster was running its >> leader election loop and attached them. They are a couple of thousand lines >> each and should show the problem. I'd be happy to send fewer or more lines, >> though. >> >> I'd be happy about any help or suggestions towards a resolution. >> >> Best wishes, >> Manuel >> >> ``` >> # 2>/dev/null sysctl -a | grep nf_ | egrep 'max|bucket' >> net.netfilter.nf_conntrack_buckets = 2500096 >> net.netfilter.nf_conntrack_expect_max = 39060 >> net.netfilter.nf_conntrack_max = 10000000 >> net.netfilter.nf_conntrack_tcp_max_retrans = 3 >> net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 >> net.nf_conntrack_max = 10000000 >> ``` >> >> ``` >> # from chrony.conf >> server 172.16.35.140 iburst >> server 172.16.35.141 iburst >> server 172.16.35.142 iburst >> server osd-1 iburst >> server osd-2 iburst >> server osd-3 iburst >> server osd-4 iburst >> server osd-5 iburst >> server osd-6 iburst >> server 0.de.pool.ntp.org iburst >> server 1.de.pool.ntp.org iburst >> server 2.de.pool.ntp.org iburst >> server 3.de.pool.ntp.org iburst >> ``` >> >> ``` >> # chronyc sources -v >> 210 Number of sources = 13 >> >> .-- Source mode '^' = server, '=' = peer, '#' = local clock. >> / .- Source state '*' = current synced, '+' = combined , '-' = not >> combined, >> | / '?' = unreachable, 'x' = time may be in error, '~' = time too >> variable. >> || .- xxxx [ yyyy ] +/- >> zzzz >> || Reachability register (octal) -. | xxxx = adjusted >> offset, >> || Log2(Polling interval) --. | | yyyy = measured >> offset, >> || \ | | zzzz = estimated >> error. >> || | | \ >> MS Name/IP address Stratum Poll Reach LastRx Last sample >> >> =============================================================================== >> ^- 172.16.35.140 3 6 377 55 +213us[ +213us] +/- >> 26ms >> ^+ 172.16.35.141 2 6 377 63 +807us[ +807us] +/- >> 12ms >> ^+ 172.16.35.142 3 9 377 253 +1488us[+1488us] +/- >> 7675us >> ^+ osd-1 3 6 377 62 +145us[ +145us] +/- >> 7413us >> ^+ osd-2 2 6 377 61 -6577ns[-6577ns] +/- >> 8108us >> ^+ osd-3 4 6 377 50 +509us[ +509us] +/- >> 6810us >> ^+ osd-4 4 6 377 54 +447us[ +447us] +/- >> 7231us >> ^+ osd-5 3 6 377 52 +252us[ +252us] +/- >> 6738us >> ^+ osd-6 2 6 377 56 -13us[ -13us] +/- >> 8563us >> ^+ funky.f5s.de 2 8 377 207 +371us[ +371us] +/- >> 24ms >> ^- hetzner01.ziegenberg.at 2 10 377 445 +735us[ +685us] +/- >> 32ms >> ^* time1.uni-paderborn.de 1 9 377 253 -4246us[-4297us] +/- >> 9089us >> ^- 25000-021.cloud.services> 2 10 377 147 +832us[ +832us] +/- >> 48ms >> ``` >> >> On Wed, Sep 29, 2021 at 3:43 PM Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> >> wrote: >> >>> Dear all, >>> >>> I was a bit too optimistic in my previous email. It looks like the >>> leader election loop reappeared. I could fix it by stopping the rogue mon >>> daemon but I don't know how to fix it for good. >>> >>> I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in >>> total). I have about 35 HDDs in each server and 4 SSDs. The servers have >>> about 250 GB of RAM, there is no memory pressure on any daemon. I have an >>> identical mirror cluster that does not have the issue (but that one does >>> not have its file system mounted elsewhere and is running no rgws). I have >>> migrated both clusters recently to cephadm and then from octopus to pacific. >>> >>> The primary cluster has problems (pulled from the cluster before >>> fixing/restarting mon daemon): >>> >>> - `ceph -s` and other commands feel pretty sluggish >>> - `ceph -s` shows inconsistent results in the "health" section and >>> "services" overview >>> - cephfs clients hang and after rebooting the clients, mounting is not >>> possible any more >>> - `ceph config dump` prints "monclient: get_monmap_and_config failed to >>> get config" >>> - I have a mon leader election loop shown in its journalctl output on >>> the bottom. >>> - the primary mds daemon says things like "skipping upkeep work because >>> connection to Monitors appears laggy" and "ms_deliver_dispatch: unhandled >>> message 0x55ecdec1d340 client_session(request_renewcaps seq 88463) from >>> client.60591566 v1:172.16.59.39:0/3197981635" in their journalctl output >>> >>> I tried to reboot the client that is supposedly not reacting to cache >>> pressure but that did not help either. The servers are connected to the >>> same VLT switch pair and use LACP 2x40GbE for cluster and 2x10GbE for >>> public network. I have disabled firewalld on the nodes but that did not fix >>> the problem either. I suspect that "laggy monitors" are caused more >>> probable on the software side than on the network side. >>> >>> I took down the rogue mon.osd-1 with `docker stop` and it looks like the >>> problem disappears then. >>> >>> To summarize: I suspect the cause to be connected to the mon daemons. I >>> have found that similar problems have been reported a couple of times. >>> >>> What is the best way forward? It seems that the general suggestion for >>> such cases is to just "ceph orch redeploy mon", so I did this. >>> >>> Is there any way to find out the root cause to get rid of it? >>> >>> Best wishes, >>> Manuel >>> >>> osd-1 # ceph -s >>> cluster: >>> id: 55633ec3-6c0c-4a02-990c-0f87e0f7a01f >>> health: HEALTH_WARN >>> 1 clients failing to respond to cache pressure >>> 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 >>> Low space hindering backfill (add storage if this doesn't >>> resolve itself): 5 pgs backfill_toofull >>> >>> services: >>> mon: 5 daemons, quorum (age 4h), out of quorum: osd-1, osd-2, >>> osd-5, osd-4, osd-3 >>> mgr: osd-4.oylrhe(active, since 2h), standbys: osd-1, osd-3, >>> osd-5.jcfyqe, osd-2 >>> mds: 1/1 daemons up, 1 standby >>> osd: 180 osds: 180 up (since 4h), 164 in (since 6h); 285 remapped pgs >>> rgw: 12 daemons active (6 hosts, 2 zones) >>> >>> data: >>> volumes: 1/1 healthy >>> pools: 14 pools, 5322 pgs >>> objects: 263.18M objects, 944 TiB >>> usage: 1.4 PiB used, 639 TiB / 2.0 PiB avail >>> pgs: 25576348/789544299 objects misplaced (3.239%) >>> 5026 active+clean >>> 291 active+remapped+backfilling >>> 5 active+remapped+backfill_toofull >>> >>> io: >>> client: 165 B/s wr, 0 op/s rd, 0 op/s wr >>> recovery: 2.3 GiB/s, 652 objects/s >>> >>> progress: >>> Global Recovery Event (53m) >>> [==========================..] (remaining: 3m) >>> >>> osd-1 # ceph health detail >>> HEALTH_WARN 1 clients failing to respond to cache pressure; 1/5 mons >>> down, quorum osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add >>> storage if this doesn't resolve itself): 5 pgs backfill_toofull >>> [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure >>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing >>> to respond to cache pressure client_id: 56229355 >>> [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-2,osd-5,osd-4 >>> mon.osd-3 (rank 4) addr [v2: >>> 172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of quorum) >>> [WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if >>> this doesn't resolve itself): 5 pgs backfill_toofull >>> pg 3.23d is active+remapped+backfill_toofull, acting [145,128,87] >>> pg 3.33f is active+remapped+backfill_toofull, acting [133,24,107] >>> pg 3.3cb is active+remapped+backfill_toofull, acting [100,90,82] >>> pg 3.3fc is active+remapped+backfill_toofull, acting [155,27,106] >>> pg 3.665 is active+remapped+backfill_toofull, acting [153,73,114] >>> >>> >>> osd-1 # journalctl -f -u >>> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxx-1.service >>> -- Logs begin at Wed 2021-09-29 08:52:53 CEST. -- >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"} v 0) v1 >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.214+0000 >>> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"}]: dispatch >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.398+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.799+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "df", "detail": "detail"} v 0) v1 >>> Sep 29 15:05:32 osd-1 bash[423735]: debug 2021-09-29T13:05:32.810+0000 >>> 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? >>> 172.16.62.12:0/2081332311' entity='client.admin' cmd=[{"prefix": "df", >>> "detail": "detail"}]: dispatch >>> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:33 osd-1 bash[423735]: debug 2021-09-29T13:05:33.600+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:35 osd-1 bash[423735]: debug 2021-09-29T13:05:35.195+0000 >>> 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.045+0000 >>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting >>> 85 slow ops, oldest is mon_command([{prefix=config-key set, >>> key=mgr/cephadm/host.osd-2}] v 0) >>> Sep 29 15:05:37 osd-1 bash[423735]: debug 2021-09-29T13:05:37.205+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, >>> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:46.215+0000 >>> 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting >>> 173 slow ops, oldest is mon_command([{prefix=config-key set, >>> key=mgr/cephadm/host.osd-2}] v 0) >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c >>> 29405655..29406327) collect timeout, calling fresh election >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN >>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum >>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this >>> doesn't resolve itself): 5 pgs backfill_toofull >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 >>> clients failing to respond to cache pressure >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : >>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to >>> respond to cache pressure client_id: 56229355 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons >>> down, quorum osd-1,osd-2,osd-5,osd-4 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) >>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of >>> quorum) >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: >>> Low space hindering backfill (add storage if this doesn't resolve itself): >>> 5 pgs backfill_toofull >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is >>> active+remapped+backfill_toofull, acting [145,128,87] >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is >>> active+remapped+backfill_toofull, acting [133,24,107] >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is >>> active+remapped+backfill_toofull, acting [100,90,82] >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is >>> active+remapped+backfill_toofull, acting [155,27,106] >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.508+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is >>> active+remapped+backfill_toofull, acting [153,73,114] >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 >>> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >>> election >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.509+0000 >>> 7f6e854ba700 1 paxos.0).electionLogic(26610) init, last seen epoch 26610 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.533+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >>> unique device id for md126: fallback method has no model nor serial' >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 >>> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >>> election >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.538+0000 >>> 7f6e854ba700 1 paxos.0).electionLogic(26613) init, last seen epoch 26613, >>> mid-election, bumping >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >>> unique device id for md126: fallback method has no model nor serial' >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.547+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop >>> unexpected msg >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.551+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.554+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.555+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": >>> "osd_memory_target"} v 0) v1 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >>> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": >>> "osd_memory_target"}]: dispatch >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"} v 0) v1 >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.565+0000 >>> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"}]: dispatch >>> Sep 29 15:05:51 osd-1 bash[423735]: debug 2021-09-29T13:05:51.572+0000 >>> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, >>> mons osd-1,osd-2,osd-5,osd-4,osd-3 in quorum (ranks 0,1,2,3,4) >>> Sep 29 15:05:52 osd-1 bash[423735]: debug 2021-09-29T13:05:52.830+0000 >>> 7f6e89cc3700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> >>> conn(0x55629242f000 0x556289dde000 :6789 s=ACCEPTING pgs=0 cs=0 >>> l=0).handle_client_banner accept peer addr is really - (socket is v1: >>> 172.16.35.183:47888/0) >>> Sep 29 15:05:58 osd-1 bash[423735]: debug 2021-09-29T13:05:58.825+0000 >>> 7f6e894c2700 0 --1- [v2:172.16.62.10:3300/0,v1:172.16.62.10:6789/0] >> >>> conn(0x55629b6e8800 0x5562a32e3800 :6789 s=ACCEPTING pgs=0 cs=0 >>> l=0).handle_client_banner accept peer addr is really - (socket is v1: >>> 172.16.35.182:42746/0) >>> Sep 29 15:06:03 osd-1 bash[423735]: debug 2021-09-29T13:05:59.667+0000 >>> 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting >>> 266 slow ops, oldest is mon_command([{prefix=config-key set, >>> key=mgr/cephadm/host.osd-2}] v 0) >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c >>> 29405655..29406327) collect timeout, calling fresh election >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN >>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum >>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this >>> doesn't resolve itself): 5 pgs backfill_toofull >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 >>> clients failing to respond to cache pressure >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : >>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to >>> respond to cache pressure client_id: 56229355 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons >>> down, quorum osd-1,osd-2,osd-5,osd-4 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) >>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of >>> quorum) >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: >>> Low space hindering backfill (add storage if this doesn't resolve itself): >>> 5 pgs backfill_toofull >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is >>> active+remapped+backfill_toofull, acting [145,128,87] >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is >>> active+remapped+backfill_toofull, acting [133,24,107] >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is >>> active+remapped+backfill_toofull, acting [100,90,82] >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is >>> active+remapped+backfill_toofull, acting [155,27,106] >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.034+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is >>> active+remapped+backfill_toofull, acting [153,73,114] >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 >>> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >>> election >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.058+0000 >>> 7f6e854ba700 1 paxos.0).electionLogic(26616) init, last seen epoch 26616 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.064+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >>> unique device id for md126: fallback method has no model nor serial' >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop >>> unexpected msg >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "status", "format": "json-pretty"} v 0) v1 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.065+0000 >>> 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? >>> 172.16.62.11:0/4154945587' entity='client.admin' cmd=[{"prefix": >>> "status", "format": "json-pretty"}]: dispatch >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.068+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.072+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.082+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": >>> "osd_memory_target"} v 0) v1 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 >>> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": >>> "osd_memory_target"}]: dispatch >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"} v 0) v1 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.083+0000 >>> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"}]: dispatch >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.287+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.488+0000 >>> 7f6e89cc3700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "df", "detail": "detail"} v 0) v1 >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.719+0000 >>> 7f6e854ba700 0 log_channel(audit) log [DBG] : from='client.? >>> 172.16.62.11:0/1624876515' entity='client.admin' cmd=[{"prefix": "df", >>> "detail": "detail"}]: dispatch >>> Sep 29 15:06:04 osd-1 bash[423735]: debug 2021-09-29T13:06:04.889+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:05 osd-1 bash[423735]: debug 2021-09-29T13:06:05.691+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.073+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.288+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.294+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:07 osd-1 bash[423735]: debug 2021-09-29T13:06:07.393+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:08 osd-1 bash[423735]: debug 2021-09-29T13:06:08.216+0000 >>> 7f6e894c2700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.034+0000 >>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting >>> 289 slow ops, oldest is mon_command([{prefix=config-key set, >>> key=mgr/cephadm/host.osd-2}] v 0) >>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.064+0000 >>> 7f6e87cbf700 1 paxos.0).electionLogic(26617) init, last seen epoch 26617, >>> mid-election, bumping >>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.087+0000 >>> 7f6e87cbf700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >>> unique device id for md126: fallback method has no model nor serial' >>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 >>> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >>> election >>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.101+0000 >>> 7f6e854ba700 1 paxos.0).electionLogic(26621) init, last seen epoch 26621, >>> mid-election, bumping >>> Sep 29 15:06:09 osd-1 bash[423735]: debug 2021-09-29T13:06:09.110+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >>> unique device id for md126: fallback method has no model nor serial' >>> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.038+0000 >>> 7f6e87cbf700 -1 mon.osd-1@0(electing) e11 get_health_metrics reporting >>> 289 slow ops, oldest is mon_command([{prefix=config-key set, >>> key=mgr/cephadm/host.osd-2}] v 0) >>> Sep 29 15:06:14 osd-1 bash[423735]: debug 2021-09-29T13:06:14.123+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [INF] : mon.osd-1 is new leader, >>> mons osd-1,osd-5,osd-4,osd-3 in quorum (ranks 0,2,3,4) >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:22.796+0000 >>> 7f6e854ba700 0 mon.osd-1@0(leader) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-2}] v 0) v1 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 -1 mon.osd-1@0(leader) e11 get_health_metrics reporting >>> 423 slow ops, oldest is mon_command([{prefix=config-key set, >>> key=mgr/cephadm/host.osd-2}] v 0) >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 1 mon.osd-1@0(leader).paxos(paxos recovering c >>> 29405655..29406327) collect timeout, calling fresh election >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN >>> 1 clients failing to respond to cache pressure; 1/5 mons down, quorum >>> osd-1,osd-2,osd-5,osd-4; Low space hindering backfill (add storage if this >>> doesn't resolve itself): 5 pgs backfill_toofull >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_RECALL: 1 >>> clients failing to respond to cache pressure >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : >>> mds.cephfs.osd-1.qkzuas(mds.0): Client med-file1:med-file1 failing to >>> respond to cache pressure client_id: 56229355 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons >>> down, quorum osd-1,osd-2,osd-5,osd-4 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : mon.osd-3 (rank 4) >>> addr [v2:172.16.62.12:3300/0,v1:172.16.62.12:6789/0] is down (out of >>> quorum) >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : [WRN] PG_BACKFILL_FULL: >>> Low space hindering backfill (add storage if this doesn't resolve itself): >>> 5 pgs backfill_toofull >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.23d is >>> active+remapped+backfill_toofull, acting [145,128,87] >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.33f is >>> active+remapped+backfill_toofull, acting [133,24,107] >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3cb is >>> active+remapped+backfill_toofull, acting [100,90,82] >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.3fc is >>> active+remapped+backfill_toofull, acting [155,27,106] >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.223+0000 >>> 7f6e87cbf700 0 log_channel(cluster) log [WRN] : pg 3.665 is >>> active+remapped+backfill_toofull, acting [153,73,114] >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 >>> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >>> election >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.224+0000 >>> 7f6e854ba700 1 paxos.0).electionLogic(26624) init, last seen epoch 26624 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.253+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >>> unique device id for md126: fallback method has no model nor serial' >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.254+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 handle_timecheck drop >>> unexpected msg >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.256+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-6}] v 0) v1 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.258+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "config rm", "who": "osd/host:osd-4", "name": >>> "osd_memory_target"} v 0) v1 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >>> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-4", "name": >>> "osd_memory_target"}]: dispatch >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >>> 7f6e854ba700 0 mon.osd-1@0(electing) e11 handle_command >>> mon_command({"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"} v 0) v1 >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.259+0000 >>> 7f6e854ba700 0 log_channel(audit) log [INF] : from='mgr.66351528 ' >>> entity='' cmd=[{"prefix": "config rm", "who": "osd/host:osd-3", "name": >>> "osd_memory_target"}]: dispatch >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 >>> 7f6e854ba700 0 log_channel(cluster) log [INF] : mon.osd-1 calling monitor >>> election >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.273+0000 >>> 7f6e854ba700 1 paxos.0).electionLogic(26627) init, last seen epoch 26627, >>> mid-election, bumping >>> Sep 29 15:06:27 osd-1 bash[423735]: debug 2021-09-29T13:06:27.282+0000 >>> 7f6e854ba700 1 mon.osd-1@0(electing) e11 collect_metadata md126: no >>> unique device id for md126: fallback method has no model nor serial' >>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.050+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.250+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> Sep 29 15:06:28 osd-1 bash[423735]: debug 2021-09-29T13:06:28.651+0000 >>> 7f6e844b8700 1 mon.osd-1@0(electing) e11 handle_auth_request failed to >>> assign global_id >>> >>> osd-1 # journalctl -f -u >>> ceph-55633ec3-6c0c-4a02-990c-0f87e0f7a01f@xxxxxxxxxxxxxx-1.qkzuas.service >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdc463500 >>> client_session(request_renewcaps seq 88463) from client.60598827 v1: >>> 172.16.59.39:0/1389838619 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece3a0cfc0 >>> client_session(request_renewcaps seq 88463) from client.60598821 v1: >>> 172.16.59.39:0/858534994 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece1e24540 >>> client_session(request_renewcaps seq 88459) from client.60591845 v1: >>> 172.16.59.7:0/1705034209 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ece055f340 >>> client_session(request_renewcaps seq 88462) from client.60598851 v1: >>> 172.16.59.26:0/763945533 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdcb97c00 >>> client_session(request_renewcaps seq 88459) from client.60591994 v1: >>> 172.16.59.7:0/4158829178 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdfa9bc00 >>> client_session(request_renewcaps seq 86286) from client.60712226 v1: >>> 172.16.59.64:0/1098377799 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ec336dc000 >>> client_session(request_renewcaps seq 88463) from client.60591563 v1: >>> 172.16.59.39:0/1765846930 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdae976c0 >>> client_session(request_renewcaps seq 86592) from client.60695401 v1: >>> 172.16.59.27:0/2213843285 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdf211a40 >>> client_session(request_renewcaps seq 88461) from client.60599085 v1: >>> 172.16.59.19:0/1476359719 >>> Sep 29 15:19:52 osd-1 bash[254093]: debug 2021-09-29T13:19:52.349+0000 >>> 7f994ec61700 0 ms_deliver_dispatch: unhandled message 0x55ecdec1d340 >>> client_session(request_renewcaps seq 88463) from client.60591566 v1: >>> 172.16.59.39:0/3197981635 >>> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx