I have checked the network already. There's no indication for a problem with the network, means there are no dropped packages and loadtest with iperf shows good performance. Am 29.10.2019 um 17:44 schrieb Bryan Stillwell: > I would look into a potential network problem. Check for errors on both the server side and on the switch side. > > Otherwise I'm not really sure what's going on. Someone else will have to jump into the conversation. > > Bryan > > On Oct 29, 2019, at 10:38 AM, Thomas Schneider <74cmonty@xxxxxxxxx> wrote: >> Notice: This email is from an external sender. >> >> >> >> Thanks. >> >> 2 of 4 MGR nodes are sick. >> I have stopped MGR services on both nodes. >> >> When I start the service again on node A, I get this in its log: >> root@ld5508:~# tail -f /var/log/ceph/ceph-mgr.ld5508.log >> 2019-10-29 17:32:02.024 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.028 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.032 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.040 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.044 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.048 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.052 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.060 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 >> s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 >> connect got BADAUTHORIZER >> 2019-10-29 17:32:02.064 7fe209fe8700 -1 received signal: Terminated >> from /sbin/init (PID: 1) UID: 0 >> 2019-10-29 17:32:02.064 7fe209fe8700 -1 mgr handle_signal *** Got signal >> Terminated *** >> 2019-10-29 17:37:54.319 7f0e26fc1dc0 0 set uid:gid to 64045:64045 >> (ceph:ceph) >> 2019-10-29 17:37:54.319 7f0e26fc1dc0 0 ceph version 14.2.4 >> (65249672c6e6d843510e7e01f8a4b976dcac3db1) nautilus (stable), process >> ceph-mgr, pid 250399 >> 2019-10-29 17:37:54.319 7f0e26fc1dc0 0 pidfile_write: ignore empty >> --pid-file >> 2019-10-29 17:37:54.331 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'ansible' >> 2019-10-29 17:37:54.503 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'balancer' >> 2019-10-29 17:37:54.531 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'crash' >> 2019-10-29 17:37:54.551 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'dashboard' >> 2019-10-29 17:37:54.915 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'deepsea' >> 2019-10-29 17:37:55.071 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'devicehealth' >> 2019-10-29 17:37:55.103 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'influx' >> 2019-10-29 17:37:55.127 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'insights' >> 2019-10-29 17:37:55.207 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'iostat' >> 2019-10-29 17:37:55.227 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'localpool' >> 2019-10-29 17:37:55.247 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'orchestrator_cli' >> 2019-10-29 17:37:55.295 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'pg_autoscaler' >> 2019-10-29 17:37:55.347 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'progress' >> 2019-10-29 17:37:55.387 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'prometheus' >> 2019-10-29 17:37:55.599 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'rbd_support' >> 2019-10-29 17:37:55.647 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'restful' >> 2019-10-29 17:37:55.959 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'selftest' >> 2019-10-29 17:37:55.983 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'status' >> 2019-10-29 17:37:56.015 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'telegraf' >> 2019-10-29 17:37:56.051 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'telemetry' >> 2019-10-29 17:37:56.331 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'test_orchestrator' >> 2019-10-29 17:37:56.399 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'volumes' >> 2019-10-29 17:37:56.459 7f0e26fc1dc0 1 mgr[py] Loading python module >> 'zabbix' >> 2019-10-29 17:37:56.503 7f0e21cdd700 1 mgr load Constructed class from >> module: dashboard >> 2019-10-29 17:37:56.503 7f0e214dc700 0 ms_deliver_dispatch: unhandled >> message 0x56346f978400 mon_map magic: 0 v1 from mon.0 v2:10.97.206.93:3300/0 >> 2019-10-29 17:37:56.507 7f0e214dc700 0 client.0 ms_handle_reset on >> v2:10.97.206.93:6912/22258 >> 2019-10-29 17:37:56.743 7f0e16363700 0 mgr[dashboard] >> [29/Oct/2019:17:37:56] ENGINE Error in HTTPServer.tick >> Traceback (most recent call last): >> File >> "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line >> 2021, in start >> self.tick() >> File >> "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line >> 2090, in tick >> s, ssl_env = self.ssl_adapter.wrap(s) >> File >> "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py", >> line 67, in wrap >> server_side=True) >> File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket >> _context=self) >> File "/usr/lib/python2.7/ssl.py", line 599, in __init__ >> self.do_handshake() >> File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake >> self._sslobj.do_handshake() >> error: [Errno 0] Error >> >> ^C >> >> This looks like a severe issue. >> >> >> Am 29.10.2019 um 17:22 schrieb Bryan Stillwell: >>> On Oct 29, 2019, at 9:44 AM, Thomas Schneider <74cmonty@xxxxxxxxx> wrote: >>>> in my unhealthy cluster I cannot run several ceph osd command because >>>> they hang, e.g. >>>> ceph osd df >>>> ceph osd pg dump >>>> >>>> Also, ceph balancer status hangs. >>>> >>>> How can I fix this issue? >>> Check the status of your ceph-mgr processes (restart them if needed and check the logs for more details). Those are responsible for handling those commands in recent releases. >>> >>> Bryan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx