How to troubleshoot monitor node

Andreas Feile <atann@xxxxxxxxxxxx> · Mon, 10 Jan 2022 16:09:50 +0100

Hi all,

I've set up a 6-node ceph cluster to learn how ceph works and what I can 
do with it. However, I'm new to ceph, so if the answer to one of my 
questions is RTFM, point me to the right place.

My problem is this:
The cluster consists of 3 mons and 3 osds. Even though the dashboard 
shows all green, the mon01 has a problem: the ceph command hangs and 
never comes back:

root@mon01:~# ceph --version
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus 
(stable)

root@mon01:~# ceph -s
^CCluster connection aborted

To see what happens I tried this:

root@mon01:~# ceph -s --debug-ms=1
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 Processor -- start
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 -- start start
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 --2- >> 
[v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] conn(0x7f4a28066a30 
0x7f4a28066e40 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect
2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 -- --> 
[v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] -- mon_getmap magic: 0 
v1 -- 0x7f4a28067330 con 0x7f4a28066a30
2022-01-10T15:51:30.434+0100 7f4a2659c700 1 -- >> 
[v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] conn(0x7f4a28066a30 
msgr2=0x7f4a28066e40 unknown :-1 s=STATE_CONNECTING_RE l=0).process 
reconnect failed to v2:192.168.14.48:3300/0
...

Indeed, both ports are closed:

root@mon01:~# nc -z 192.168.14.48 6789; echo $?
1
root@mon01:~# nc -z 192.168.14.48 3300; echo $?
1

In /var/log/ceph/cephadm.log, I cannot see any useful infos about what 
might go wrong.

I'm not aware of anything I could have done to trigger this error, and I 
wonder what I could do next to repair this monitor node.

Any hint is appreciated.

--
Andre Tann
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx