Here are all the active ports on mon1 (with the exception of sshd and ntpd): # netstat -npl Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 <mon1_ip>:3300 0.0.0.0:* LISTEN 1582/ceph-mon tcp 0 0 <mon1_ip>:6789 0.0.0.0:* LISTEN 1582/ceph-mon tcp6 0 0 :::9093 :::* LISTEN 908/alertmanager tcp6 0 0 :::9094 :::* LISTEN 908/alertmanager tcp6 0 0 :::9095 :::* LISTEN 896/prometheus tcp6 0 0 :::9100 :::* LISTEN 906/node_exporter tcp6 0 0 :::3000 :::* LISTEN 882/grafana-server udp6 0 0 :::9094 :::* 908/alertmanager I've tried telnet from mon1 host, can connect to 3300 and 6789: # telnet <mon1_ip> 3300 Trying <mon1_ip>... Connected to <mon1_ip>. Escape character is '^]'. ceph v2 # telnet <mon1_ip> 6789 Trying <mon1_ip>... Connected to <mon1_ip>. Escape character is '^]'. ceph v027QQ 6800 and 6801 refuse connection: # telnet <mon1_ip> 6800 Trying <mon1_ip>... telnet: Unable to connect to remote host: Connection refused I don't see any errors in the log related to failures to bind... and all CEPH systemd services are running as far as I can tell: # systemctl list-units -a | grep ceph ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@alertmanager.mon1.service loaded active running Ceph alertmanager.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@crash.mon1.service loaded active running Ceph crash.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@grafana.mon1.service loaded active running Ceph grafana.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@mgr.mon1.peevkl.service loaded active running Ceph mgr.mon1.peevkl for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@mon.mon1.service loaded active running Ceph mon.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@node-exporter.mon1.service loaded active running Ceph node-exporter.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@prometheus.mon1.service loaded active running Ceph prometheus.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice loaded active active system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5.target loaded active active Ceph cluster e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph.target loaded active active All Ceph clusters and services Here are currently active docker images: # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES dfd8dbeccf1e ceph/ceph:v15 "/usr/bin/ceph-mgr -…" 41 minutes ago Up 41 minutes ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mgr.mon1.peevkl 9452d1db7ffb ceph/ceph:v15 "/usr/bin/ceph-mon -…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mon.mon1 703ec4a43824 prom/prometheus:v2.18.1 "/bin/prometheus --c…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-prometheus.mon1 d816ec5e645f ceph/ceph:v15 "/usr/bin/ceph-crash…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-crash.mon1 38d283ba6424 ceph/ceph-grafana:latest "/bin/sh -c 'grafana…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-grafana.mon1 cc119ec8f09a prom/node-exporter:v0.18.1 "/bin/node_exporter …" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-node-exporter.mon1 aa1d339c4100 prom/alertmanager:v0.20.0 "/bin/alertmanager -…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-alertmanager.mon1 iptables are active, I tried setting all chain policies to ACCEPT (didn't help), the rules are as such: 0 0 CEPH tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:6789 5060 303K CEPH tcp -- * * 0.0.0.0/0 0.0.0.0/0 multiport dports 6800:7300 Chain CEPH includes addresses for monitors and OSDs. пн, 27 июл. 2020 г. в 17:07, Dino Godor <dg@xxxxxxxxxxxx>: > Hi, > > have you tried to locally connect to the ports with netcat (or telnet)? > > Is the process listening ? (something like netstat -4ln or the current > equivalent thereof) > > Is the old (new) Firewall maybe still running ? > > > On 27.07.20 16:00, Илья Борисович Волошин wrote: > > Hello, > > > > I've created an Octopus 15.2.4 cluster with 3 monitors and 3 OSDs (6 > hosts > > in total, all ESXi VMs). It lived through a couple of reboots without > > problem, then I've reconfigured the main host a bit: > > set iptables-legacy as current option in update-alternatives (this is a > > Debian10 system), applied a basic ruleset of iptables and restarted > docker. > > > > After that the cluster became unresponsive (any ceph command hangs > > indefinitely). I can use admin socket to manipulate config though. > Setting > > debug_ms to 5 I see this in the logs (timestamps cut for readability): > > > > 7f4096f41700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > [v2:<mon2_ip>:3300/0,v1:<mon2_ip>:6789/0] conn(0x55c21b975800 > > 0x55c21ab45180 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rx=0 tx= > > 0).send_message enqueueing message m=0x55c21bd84a00 type=67 > mon_probe(probe > > e30397f0-cc32-11ea-8c8e-000c29469cd5 name mon1 mon_release octopus) v7 > > 7f4098744700 1 -- >> > > [v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008] > > conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 > s=STATE_CONNECTING_RE > > l=0).process reconnect failed to v2:81.200.2 > > .152:6800/561959008 > > 7f4098744700 2 -- >> > > [v2:<mon1_ip>:6800/561959008,v1:<mon1_ip>:6801/561959008] > > conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 > s=STATE_CONNECTING_RE > > l=0).process connection refused! > > > > and this: > > > > 7f4098744700 2 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > cs=0 > > l=1 rx=0 tx=0)._fault on lossy channel, failing > > 7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > cs=0 > > l=1 rx=0 tx=0).stop > > 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > cs=0 > > l=1 rx=0 tx=0).reset_recv_state > > 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 > cs=0 > > l=1 rx=0 tx=0).reset_security > > 7f409373a700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=NONE pgs=0 cs=0 l=0 > rx=0 > > tx=0).accept > > 7f4098744700 1 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=BANNER_ACCEPTING pgs=0 > > cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0 > > 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 > > cs=0 l=0 rx=0 tx=0).handle_hello received hello: peer_type=8 > > peer_addr_for_me=v2:<mon1_ip>:3300/0 > > 7f4098744700 5 --2- [v2:<mon1_ip>:3300/0,v1:<mon1_ip>:6789/0] >> > > conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 > > cs=0 l=0 rx=0 tx=0).handle_hello getsockname says I am <mon1_ip>:3300 > when > > talking to v2:<mon1_ip>:49012/0 > > 7f4098744700 1 mon.mon1@0(probing) e5 handle_auth_request failed to > assign > > global_id > > > > Config (the result of ceph --admin-daemon > > /run/ceph/e30397f0-cc32-11ea-8c8e-000c29469cd5/ceph-mon.mon1.asok config > > show): > > https://pastebin.com/kifMXs9H > > > > I can connect to ports 3300 and 6789 with telnet; 6800 and 6801 return > > 'process connection refused' > > > > Setting all iptables policies to ACCEPT didn't change anything. > > > > Where should I start digging to fix this problem? I'd like to at least > > understand why this happened before putting the cluster into production. > > Any help is appreciated. > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx