Re: Several ceph osd commands hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks.

2 of 4 MGR nodes are sick.
I have stopped MGR services on both nodes.

When I start the service again on node A, I get this in its log:
root@ld5508:~# tail -f /var/log/ceph/ceph-mgr.ld5508.log
2019-10-29 17:32:02.024 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.028 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.032 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.040 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.044 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.048 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.052 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x564582ad5180 0x5645991ca800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.060 7fe20e881700  0 --1- 10.97.206.96:0/201758478 >>
v1:10.97.206.96:7055/17961 conn(0x5645977b5180 0x564582b2e800 :-1
s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2
connect got BADAUTHORIZER
2019-10-29 17:32:02.064 7fe209fe8700 -1 received  signal: Terminated
from /sbin/init  (PID: 1) UID: 0
2019-10-29 17:32:02.064 7fe209fe8700 -1 mgr handle_signal *** Got signal
Terminated ***
2019-10-29 17:37:54.319 7f0e26fc1dc0  0 set uid:gid to 64045:64045
(ceph:ceph)
2019-10-29 17:37:54.319 7f0e26fc1dc0  0 ceph version 14.2.4
(65249672c6e6d843510e7e01f8a4b976dcac3db1) nautilus (stable), process
ceph-mgr, pid 250399
2019-10-29 17:37:54.319 7f0e26fc1dc0  0 pidfile_write: ignore empty
--pid-file
2019-10-29 17:37:54.331 7f0e26fc1dc0  1 mgr[py] Loading python module
'ansible'
2019-10-29 17:37:54.503 7f0e26fc1dc0  1 mgr[py] Loading python module
'balancer'
2019-10-29 17:37:54.531 7f0e26fc1dc0  1 mgr[py] Loading python module
'crash'
2019-10-29 17:37:54.551 7f0e26fc1dc0  1 mgr[py] Loading python module
'dashboard'
2019-10-29 17:37:54.915 7f0e26fc1dc0  1 mgr[py] Loading python module
'deepsea'
2019-10-29 17:37:55.071 7f0e26fc1dc0  1 mgr[py] Loading python module
'devicehealth'
2019-10-29 17:37:55.103 7f0e26fc1dc0  1 mgr[py] Loading python module
'influx'
2019-10-29 17:37:55.127 7f0e26fc1dc0  1 mgr[py] Loading python module
'insights'
2019-10-29 17:37:55.207 7f0e26fc1dc0  1 mgr[py] Loading python module
'iostat'
2019-10-29 17:37:55.227 7f0e26fc1dc0  1 mgr[py] Loading python module
'localpool'
2019-10-29 17:37:55.247 7f0e26fc1dc0  1 mgr[py] Loading python module
'orchestrator_cli'
2019-10-29 17:37:55.295 7f0e26fc1dc0  1 mgr[py] Loading python module
'pg_autoscaler'
2019-10-29 17:37:55.347 7f0e26fc1dc0  1 mgr[py] Loading python module
'progress'
2019-10-29 17:37:55.387 7f0e26fc1dc0  1 mgr[py] Loading python module
'prometheus'
2019-10-29 17:37:55.599 7f0e26fc1dc0  1 mgr[py] Loading python module
'rbd_support'
2019-10-29 17:37:55.647 7f0e26fc1dc0  1 mgr[py] Loading python module
'restful'
2019-10-29 17:37:55.959 7f0e26fc1dc0  1 mgr[py] Loading python module
'selftest'
2019-10-29 17:37:55.983 7f0e26fc1dc0  1 mgr[py] Loading python module
'status'
2019-10-29 17:37:56.015 7f0e26fc1dc0  1 mgr[py] Loading python module
'telegraf'
2019-10-29 17:37:56.051 7f0e26fc1dc0  1 mgr[py] Loading python module
'telemetry'
2019-10-29 17:37:56.331 7f0e26fc1dc0  1 mgr[py] Loading python module
'test_orchestrator'
2019-10-29 17:37:56.399 7f0e26fc1dc0  1 mgr[py] Loading python module
'volumes'
2019-10-29 17:37:56.459 7f0e26fc1dc0  1 mgr[py] Loading python module
'zabbix'
2019-10-29 17:37:56.503 7f0e21cdd700  1 mgr load Constructed class from
module: dashboard
2019-10-29 17:37:56.503 7f0e214dc700  0 ms_deliver_dispatch: unhandled
message 0x56346f978400 mon_map magic: 0 v1 from mon.0 v2:10.97.206.93:3300/0
2019-10-29 17:37:56.507 7f0e214dc700  0 client.0 ms_handle_reset on
v2:10.97.206.93:6912/22258
2019-10-29 17:37:56.743 7f0e16363700  0 mgr[dashboard]
[29/Oct/2019:17:37:56] ENGINE Error in HTTPServer.tick
Traceback (most recent call last):
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line
2021, in start
    self.tick()
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/__init__.py", line
2090, in tick
    s, ssl_env = self.ssl_adapter.wrap(s)
  File
"/usr/lib/python2.7/dist-packages/cherrypy/wsgiserver/ssl_builtin.py",
line 67, in wrap
    server_side=True)
  File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket
    _context=self)
  File "/usr/lib/python2.7/ssl.py", line 599, in __init__
    self.do_handshake()
  File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
error: [Errno 0] Error

^C

This looks like a severe issue.


Am 29.10.2019 um 17:22 schrieb Bryan Stillwell:
> On Oct 29, 2019, at 9:44 AM, Thomas Schneider <74cmonty@xxxxxxxxx> wrote:
>> in my unhealthy cluster I cannot run several ceph osd command because
>> they hang, e.g.
>> ceph osd df
>> ceph osd pg dump
>>
>> Also, ceph balancer status hangs.
>>
>> How can I fix this issue?
> Check the status of your ceph-mgr processes (restart them if needed and check the logs for more details).  Those are responsible for handling those commands in recent releases.
>
> Bryan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux