Thanks. 2 of 4 MGR nodes are sick. I have stopped MGR services on both nodes. When I start the service again on node A, I get this in its log: root@ld5508:~# tail -f /var/log/ceph/ceph-mgr.ld5508.log 2019-10-29 17:32:02.024 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.028 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.032 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.040 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.044 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.048 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.052 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.060 7fe20e881700 0 --1- 10.97.206.96:0/201758478 >> v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2019-10-29 17:32:02.064 7fe209fe8700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2019-10-29 17:32:02.064 7fe209fe8700 -1 mgr handle_signal *** Got signal Terminated *** 2019-10-29 17:37:54.319 7f0e26fc1dc0 0 set uid:gid to 64045:64045 (ceph:ceph) 2019-10-29 17:37:54.319 7f0e26fc1dc0 0 ceph version 14.2.4 (65249672c6e6d843510e7e01f8a4b976dcac3db1) nautilus (stable), process ceph-mgr, pid 250399 2019-10-29 17:37:54.319 7f0e26fc1dc0 0 pidfile_write: ignore empty --pid-file 2019-10-29 17:37:54.331 7f0e26fc1dc0 1 mgr[py] Loading python module 'ansible' 2019-10-29 17:37:54.503 7f0e26fc1dc0 1 mgr[py] Loading python module 'balancer' 2019-10-29 17:37:54.531 7f0e26fc1dc0 1 mgr[py] Loading python module 'crash' 2019-10-29 17:37:54.551 7f0e26fc1dc0 1 mgr[py] Loading python module 'dashboard' 2019-10-29 17:37:54.915 7f0e26fc1dc0 1 mgr[py] Loading python module 'deepsea' 2019-10-29 17:37:55.071 7f0e26fc1dc0 1 mgr[py] Loading python module 'devicehealth' 2019-10-29 17:37:55.103 7f0e26fc1dc0 1 mgr[py] Loading python module 'influx' 2019-10-29 17:37:55.127 7f0e26fc1dc0 1 mgr[py] Loading python module 'insights' 2019-10-29 17:37:55.207 7f0e26fc1dc0 1 mgr[py] Loading python module 'iostat' 2019-10-29 17:37:55.227 7f0e26fc1dc0 1 mgr[py] Loading python module 'localpool' 2019-10-29 17:37:55.247 7f0e26fc1dc0 1 mgr[py] Loading python module 'orchestrator_cli' 2019-10-29 17:37:55.295 7f0e26fc1dc0 1 mgr[py] Loading python module 'pg_autoscaler' 2019-10-29 17:37:55.347 7f0e26fc1dc0 1 mgr[py] Loading python module 'progress' 2019-10-29 17:37:55.387 7f0e26fc1dc0 1 mgr[py] Loading python module 'prometheus' 2019-10-29 17:37:55.599 7f0e26fc1dc0 1 mgr[py] Loading python module 'rbd_support' 2019-10-29 17:37:55.647 7f0e26fc1dc0 1 mgr[py] Loading python module 'restful' 2019-10-29 17:37:55.959 7f0e26fc1dc0 1 mgr[py] Loading python module 'selftest' 2019-10-29 17:37:55.983 7f0e26fc1dc0 1 mgr[py] Loading python module 'status' 2019-10-29 17:37:56.015 7f0e26fc1dc0 1 mgr[py] Loading python module 'telegraf' 2019-10-29 17:37:56.051 7f0e26fc1dc0 1 mgr[py] Loading python module 'telemetry' 2019-10-29 17:37:56.331 7f0e26fc1dc0 1 mgr[py] Loading python module 'test_orchestrator' 2019-10-29 17:37:56.399 7f0e26fc1dc0 1 mgr[py] Loading python module 'volumes' 2019-10-29 17:37:56.459 7f0e26fc1dc0 1 mgr[py] Loading python module 'zabbix' 2019-10-29 17:37:56.503 7f0e21cdd700 1 mgr load Constructed class from module: dashboard 2019-10-29 17:37:56.503 7f0e214dc700 0 ms_deliver_dispatch: unhandled message 0x56346f978400 mon_map magic: 0 v1 from mon.0 v2:10.97.206.93:3300/0 2019-10-29 17:37:56.507 7f0e214dc700 0 client.0 ms_handle_reset on v2:10.97.206.93:6912/22258 2019-10-29 17:37:56.743 7f0e16363700 0 mgr[dashboard] [29/Oct/2019:17:37:56] ENGINE Error in HTTPServer.tick Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line 2021, in start self.tick() File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line 2090, in tick s, ssl_env = self.ssl_adapter.wrap(s) File "/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py", line 67, in wrap server_side=True) File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket _context=self) File "/usr/lib/python2.7/ssl.py", line 599, in __init__ self.do_handshake() File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake self._sslobj.do_handshake() error: [Errno 0] Error ^C This looks like a severe issue. Am 29.10.2019 um 17:22 schrieb Bryan Stillwell: > On Oct 29, 2019, at 9:44 AM, Thomas Schneider <74cmonty@xxxxxxxxx> wrote: >> in my unhealthy cluster I cannot run several ceph osd command because >> they hang, e.g. >> ceph osd df >> ceph osd pg dump >> >> Also, ceph balancer status hangs. >> >> How can I fix this issue? > Check the status of your ceph-mgr processes (restart them if needed and check the logs for more details). Those are responsible for handling those commands in recent releases. > > Bryan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx