Hi all, my monitor3 is not able to rejoin the cluster (containing mon1, mon2 and mon3 - running stable emperor). I try to recreate/inject a new monmap to all 3 mon's - but only mon1 and mon2 are up and joined. Now, enabling debugging on mon3, I got the following: 2014-01-30 08:51:03.823669 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 handle_probe_reply mon.1 192.168.135.32:6789/0mon_probe(reply c7b12656-15a6-41b0-963f-4f47c62497dc name ceph-mon2 quorum 0,1 paxos( fc 1 lc 160 )) v5 2014-01-30 08:51:03.823678 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 monmap is e3: 3 mons at {mon.ceph-mon1=192.168.135.31:6789/0,mon.ceph-mon2=192.168.135.32:6789/0,mon.ceph-mon3=192.168.135.33:6789/0} 2014-01-30 08:51:03.823701 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 peer name is mon.ceph-mon2 2014-01-30 08:51:03.823706 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 existing quorum 0,1 2014-01-30 08:51:03.823708 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 peer paxos version 160 vs my version 154 (ok) 2014-01-30 08:51:03.823711 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 ready to join, but i'm not in the monmap or my addr is blank, trying to join But why mon3 ("but i'm not in the monmap") is not in the monmap ? Checking the sources https://github.com/ceph/ceph/blob/emperor/src/mon/Monitor.cc --> if (monmap->contains(name) && --> !monmap->get_addr(name).is_blank_ip()) { // i'm part of the cluster; just initiate a new election start_election(); } else { dout(10) << " ready to join, but i'm not in the monmap or my addr is blank, trying to join" << dendl; messenger->send_message(new MMonJoin(monmap->fsid, name, messenger->get_myaddr()), monmap->get_inst(*m->quorum.begin())); } My map on mon3 looks like root@ceph-mon3:/var/log/ceph# ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon3.asok mon_status { "name": "ceph-mon3", "rank": 2, "state": "probing", "election_epoch": 0, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "c7b12656-15a6-41b0-963f-4f47c62497dc", "modified": "2014-01-30 08:27:28.808771", "created": "2014-01-30 08:27:28.808771", "mons": [ { "rank": 0, "name": "mon.ceph-mon1", "addr": "192.168.135.31:6789\/0"}, { "rank": 1, "name": "mon.ceph-mon2", "addr": "192.168.135.32:6789\/0"}, { "rank": 2, "name": "mon.ceph-mon3", "addr": "192.168.135.33:6789\/0"}]}} So, the condition "(monmap->contains(name) && !monmap->get_addr(name).is_blank_ip())" should work, or ? But the start_election() is not called. Can somebody help me here ? regards Danny More infos to mon3: root@ceph-mon3:/var/log/ceph# hostname -a ceph-mon3 root@ceph-mon3:/var/log/ceph# netstat -tulpen | grep ceph-mon tcp 0 0 192.168.135.33:6789 0.0.0.0:* LISTEN 0 635369 2164/ceph-mon root@ceph-mon3:/var/log/ceph# cat /etc/hosts 127.0.0.1 localhost 192.168.135.33 ceph-mon3.dtnet.de ceph-mon3 admin@ceph-admin:~/cluster1$ ceph -s cluster c7b12656-15a6-41b0-963f-4f47c62497dc health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck stale; 192 pgs stuck unclean; 1 mons down, quorum 0,1 ceph-mon1,ceph-mon2 monmap e3: 3 mons at {ceph-mon1=192.168.135.31:6789/0,ceph-mon2=192.168.135.32:6789/0,ceph-mon3=192.168.135.33:6789/0}, election epoch 230, quorum 0,1 ceph-mon1,ceph-mon2 osdmap e28: 1 osds: 1 up, 1 in pgmap v38: 192 pgs, 3 pools, 0 bytes data, 0 objects 36388 kB used, 3724 GB / 3724 GB avail 192 stale+active+degraded
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com