Hi, I am running a 3-node Deis cluster with ceph as underlying FS. So it is ceph running inside Docker containers running in three separate servers. I rebooted all three nodes (almost at once). After rebooted, the ceph monitor refuse to connect to each other. Symptoms are: - no quorum formed, - ceph admin socket file does not exist - only the following in ceph log: Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700 0 -- :/1000021 >> 10.132.183.191:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5ce4029930).fault Dec 14 16:38:44 deis-1 sh[933]: 2014-12-14 08:38:44.265419 7f5cec71f700 0 -- :/1000021 >> 10.132.183.192:6789/0 pipe(0x7f5ce40296a0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5ce4029930).fault Dec 14 16:38:50 deis-1 sh[933]: 2014-12-14 08:38:50.267398 7f5cec71f700 0 -- :/1000021 >> 10.132.183.190:6789/0 pipe(0x7f5cd40030e0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5cd4003370).fault ...keep repeating... This is my /etc/ceph/ceph.conf file: [global] fsid = cc368515-9dc6-48e2-9526-58ac4cbb3ec9 mon initial members = deis-3 auth cluster required = cephx auth service required = cephx auth client required = cephx osd pool default size = 3 osd pool default min_size = 1 osd pool default pg_num = 128 osd pool default pgp_num = 128 osd recovery delay start = 15 log file = /dev/stdout [mon.deis-3] host = deis-3 mon addr = 10.132.183.190:6789 [mon.deis-1] host = deis-1 mon addr = 10.132.183.191:6789 [mon.deis-2] host = deis-2 mon addr = 10.132.183.192:6789 [client.radosgw.gateway] host = deis-store-gateway keyring = /etc/ceph/ceph.client.radosgw.keyring rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock log file = /dev/stdout IP table of the docker host: core@deis-3 ~ $ sudo iptables --list Chain INPUT (policy DROP) target prot opt source destination Firewall-INPUT all -- anywhere anywhere Chain FORWARD (policy DROP) target prot opt source destination ACCEPT tcp -- anywhere 172.17.0.2 tcp dpt:http ACCEPT tcp -- anywhere 172.17.0.2 tcp dpt:https ACCEPT tcp -- anywhere 172.17.0.2 tcp dpt:2222 ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere Firewall-INPUT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain Firewall-INPUT (2 references) target prot opt source destination ACCEPT all -- anywhere anywhere ACCEPT icmp -- anywhere anywhere icmp echo-reply ACCEPT icmp -- anywhere anywhere icmp destination-unreachable ACCEPT icmp -- anywhere anywhere icmp time-exceeded ACCEPT icmp -- anywhere anywhere icmp echo-request ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- 10.132.183.190 anywhere ACCEPT all -- 10.132.183.192 anywhere ACCEPT all -- 10.132.183.191 anywhere ACCEPT all -- anywhere anywhere ACCEPT tcp -- anywhere anywhere ctstate NEW multiport dports ssh,2222,http,https LOG all -- anywhere anywhere LOG level warning REJECT all -- anywhere anywhere reject-with icmp-host-prohibited All private IPs are ping-gable within the ceph monitor container. What could I do next to troubleshoot this issue? Thanks a lot! - Jimmy Chu |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com