Hi, Now I feel dumb for jumping to the conclusion that it was a simple networking issue - it isn't. I've just checked connectivity properly and I can ping and telnet 6789 from all mon servers to all other mon servers. I've just restarted the mon03 service and the log is showing the following: 2014-09-17 16:49:02.355148 7f7ef9f8c800 0 starting mon.ceph-mon-03 rank 2 at 10.1.1.66:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph-mon-03 fsid 74069c87-b361-4bb8-8ce8-6ae9deb8a9bd 2014-09-17 16:49:02.355375 7f7ef9f8c800 1 mon.ceph-mon-03 at -1(probing) e2 preinit fsid 74069c87-b361-4bb8-8ce8-6ae9deb8a9bd 2014-09-17 16:49:02.356347 7f7ef9f8c800 1 mon.ceph-mon-03 at -1(probing).paxosservice(pgmap 18241250..18241952) refresh upgraded, format 0 -> 1 2014-09-17 16:49:02.356360 7f7ef9f8c800 1 mon.ceph-mon-03 at -1(probing).pg v0 on_upgrade discarding in-core PGMap 2014-09-17 16:49:02.400316 7f7ef9f8c800 0 mon.ceph-mon-03 at -1(probing).mds e1 print_map epoch 1 flags 0 created 2013-12-09 10:19:58.534310 modified 2013-12-09 10:19:58.534332 tableserver 0 root 0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 last_failure 0 last_failure_osd_epoch 0 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding} max_mds 1 in up {} failed stopped data_pools 0 metadata_pool 1 inline_data disabled 2014-09-17 16:49:02.402373 7f7ef9f8c800 0 mon.ceph-mon-03 at -1(probing).osd e49212 crush map has features 1107558400, adjusting msgr requires 2014-09-17 16:49:02.402384 7f7ef9f8c800 0 mon.ceph-mon-03 at -1(probing).osd e49212 crush map has features 1107558400, adjusting msgr requires 2014-09-17 16:49:02.402386 7f7ef9f8c800 0 mon.ceph-mon-03 at -1(probing).osd e49212 crush map has features 1107558400, adjusting msgr requires 2014-09-17 16:49:02.402388 7f7ef9f8c800 0 mon.ceph-mon-03 at -1(probing).osd e49212 crush map has features 1107558400, adjusting msgr requires 2014-09-17 16:49:02.403725 7f7ef9f8c800 1 mon.ceph-mon-03 at -1(probing).paxosservice(auth 26001..26154) refresh upgraded, format 0 -> 1 2014-09-17 16:49:02.404834 7f7ef9f8c800 0 mon.ceph-mon-03 at -1(probing) e2 my rank is now 2 (was -1) 2014-09-17 16:49:02.407439 7f7ef331b700 1 mon.ceph-mon-03 at 2(synchronizing) e2 sync_obtain_latest_monmap 2014-09-17 16:49:02.407588 7f7ef331b700 1 mon.ceph-mon-03 at 2(synchronizing) e2 sync_obtain_latest_monmap obtained monmap e2 2014-09-17 16:49:09.514365 7f7ef331b700 0 log [INF] : mon.ceph-mon-03 calling new monitor election 2014-09-17 16:49:09.514523 7f7ef331b700 1 mon.ceph-mon-03 at 2(electing).elector(931) init, last seen epoch 931 2014-09-17 16:49:09.514658 7f7ef331b700 1 mon.ceph-mon-03 at 2(electing).paxos(paxos recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514659 lease_expire=0.000000 has v0 lc 31224482 2014-09-17 16:49:09.514665 7f7ef331b700 1 mon.ceph-mon-03 at 2(electing).paxos(paxos recovering c 31223899..31224482) is_readable now=2014-09-17 16:49:09.514666 lease_expire=0.000000 has v0 lc 31224482 2014-09-17 16:49:15.533876 7f7ef3b1c700 1 mon.ceph-mon-03 at 2(electing).elector(933) init, last seen epoch 933 2014-09-17 16:49:21.578269 7f7ef3b1c700 1 mon.ceph-mon-03 at 2(electing).elector(935) init, last seen epoch 935 2014-09-17 16:49:26.578526 7f7ef3b1c700 1 mon.ceph-mon-03 at 2(electing).elector(935) init, last seen epoch 935 2014-09-17 16:49:31.578790 7f7ef3b1c700 1 mon.ceph-mon-03 at 2(electing).elector(935) init, last seen epoch 935 2014-09-17 16:49:36.579044 7f7ef3b1c700 1 mon.ceph-mon-03 at 2(electing).elector(935) init, last seen epoch 935 The last lines about "electing" repeat forever. The other mons are logging far more entries than I have seen them log before. They look like the following (note the timestamps - all of these log lines are from just a 2 second period): 2014-09-17 16:55:10.019407 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019408 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.019418 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.019418 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.180220 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180222 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.180233 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.180234 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.192668 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.192670 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.192691 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.192692 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.276726 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.276727 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.276737 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.276737 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.302638 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.302640 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.302651 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.302652 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.362642 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.362643 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.362655 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.362656 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.385686 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.385687 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.385697 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.385697 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.406712 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.406713 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.406723 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.406724 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.423277 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.423279 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.423299 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225038) is_readable now=2014-09-17 16:55:10.423300 lease_expire=2014-09-17 16:55:14.518716 has v0 lc 31225038 2014-09-17 16:55:10.543138 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos updating c 31224401..31225038) is_readable now=2014-09-17 16:55:10.543139 lease_expire=0.000000 has v0 lc 31225038 2014-09-17 16:55:10.543145 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos updating c 31224401..31225038) is_readable now=2014-09-17 16:55:10.543145 lease_expire=0.000000 has v0 lc 31225038 2014-09-17 16:55:10.580911 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580912 lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039 2014-09-17 16:55:10.580922 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580923 lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039 2014-09-17 16:55:10.580930 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225039) is_readable now=2014-09-17 16:55:10.580930 lease_expire=2014-09-17 16:55:15.549947 has v0 lc 31225039 2014-09-17 16:55:10.606130 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos updating c 31224401..31225039) is_readable now=2014-09-17 16:55:10.606131 lease_expire=0.000000 has v0 lc 31225039 2014-09-17 16:55:10.606136 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos updating c 31224401..31225039) is_readable now=2014-09-17 16:55:10.606137 lease_expire=0.000000 has v0 lc 31225039 2014-09-17 16:55:10.633460 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).log v12645471 check_sub sending message to client.2190462 10.1.1.10:0/1004032 with 1 entries (version 12645471) 2014-09-17 16:55:10.633632 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633633 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.633646 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633651 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.633657 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633658 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.633699 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633700 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.633707 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.633707 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.695127 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.695129 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.695151 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.695152 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.800013 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.800015 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.800030 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.800031 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.830432 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.830433 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.830441 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.830442 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.848954 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.848956 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.848964 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.848965 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.887139 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.887140 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.887150 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.887151 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.913825 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.913827 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:10.913834 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:10.913835 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.010277 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.010279 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.010287 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.010288 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.098312 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.098314 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.098325 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.098326 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.109040 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.109042 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.109053 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.109054 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.170705 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.170706 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.170713 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.170714 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.222537 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.222539 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.222549 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.222550 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.431510 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.431511 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.431524 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.431525 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.453664 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.453666 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.453685 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.453687 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.520250 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.520252 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.520263 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225040) is_readable now=2014-09-17 16:55:11.520264 lease_expire=2014-09-17 16:55:15.607320 has v0 lc 31225040 2014-09-17 16:55:11.603991 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.603992 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.610948 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.610949 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.610965 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.610966 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.622479 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.622480 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.622495 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.622496 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.787013 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.787014 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.787024 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.787025 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.873613 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.873614 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.873627 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.873628 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.988465 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.988467 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 2014-09-17 16:55:11.988487 7fd5a479a700 1 mon.ceph-mon-02 at 1(peon).paxos(paxos active c 31224401..31225041) is_readable now=2014-09-17 16:55:11.988489 lease_expire=2014-09-17 16:55:16.575181 has v0 lc 31225041 I'm wondering at this point whether I should just reinject the monmap from mon01 or mon02 into mon03 or whether there is something else that can be done to fix this. With hindsight, I would have stopped the mon service before relocating the nic cable, but I expected the mon to survive a short network outage which it doesn't seem to have done :( On 17 September 2014 16:21, James Eckersall <james.eckersall at gmail.com> wrote: > Hi, > > Thanks for the advice. > > I feel pretty dumb as it does indeed look like a simple networking issue. > You know how you check things 5 times and miss the most obvious one... > > J > > On 17 September 2014 16:04, Florian Haas <florian at hastexo.com> wrote: > >> On Wed, Sep 17, 2014 at 1:58 PM, James Eckersall >> <james.eckersall at gmail.com> wrote: >> > Hi, >> > >> > I have a ceph cluster running 0.80.1 on Ubuntu 14.04. I have 3 >> monitors and >> > 4 OSD nodes currently. >> > >> > Everything has been running great up until today where I've got an issue >> > with the monitors. >> > I moved mon03 to a different switchport so it would have temporarily >> lost >> > connectivity. >> > Since then, the cluster is reporting that that mon is down, although >> it's >> > definitely up. >> > I've tried restarting the mon services on all three mons, but that >> hasn't >> > made a difference. >> > I definitely, 100% do not have any clock skew on any of the mons. This >> has >> > been triple-checked as the ceph docs seem to suggest that might be the >> cause >> > of this issue. >> > >> > Here is what ceph -s and ceph health detail are reporting as well as the >> > mon_status for each monitor: >> > >> > >> > # ceph -s ; ceph health detail >> > cluster XXX >> > health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02 >> > monmap e2: 3 mons at >> > {ceph-mon-01= >> 10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0 >> }, >> > election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02 >> > osdmap e49213: 80 osds: 80 up, 80 in >> > pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects >> > 197 TB used, 95904 GB / 290 TB avail >> > 8 active+clean+scrubbing+deep >> > 4856 active+clean >> > client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s >> > HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02 >> > mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum) >> > >> > >> > { "name": "ceph-mon-01", >> > "rank": 0, >> > "state": "leader", >> > "election_epoch": 932, >> > "quorum": [ >> > 0, >> > 1], >> > "outside_quorum": [], >> > "extra_probe_peers": [], >> > "sync_provider": [], >> > "monmap": { "epoch": 2, >> > "fsid": "XXX", >> > "modified": "0.000000", >> > "created": "0.000000", >> > "mons": [ >> > { "rank": 0, >> > "name": "ceph-mon-01", >> > "addr": "10.1.1.64:6789\/0"}, >> > { "rank": 1, >> > "name": "ceph-mon-02", >> > "addr": "10.1.1.65:6789\/0"}, >> > { "rank": 2, >> > "name": "ceph-mon-03", >> > "addr": "10.1.1.66:6789\/0"}]}} >> > >> > >> > { "name": "ceph-mon-02", >> > "rank": 1, >> > "state": "peon", >> > "election_epoch": 932, >> > "quorum": [ >> > 0, >> > 1], >> > "outside_quorum": [], >> > "extra_probe_peers": [], >> > "sync_provider": [], >> > "monmap": { "epoch": 2, >> > "fsid": "XXX", >> > "modified": "0.000000", >> > "created": "0.000000", >> > "mons": [ >> > { "rank": 0, >> > "name": "ceph-mon-01", >> > "addr": "10.1.1.64:6789\/0"}, >> > { "rank": 1, >> > "name": "ceph-mon-02", >> > "addr": "10.1.1.65:6789\/0"}, >> > { "rank": 2, >> > "name": "ceph-mon-03", >> > "addr": "10.1.1.66:6789\/0"}]}} >> > >> > >> > { "name": "ceph-mon-03", >> > "rank": 2, >> > "state": "electing", >> > "election_epoch": 931, >> > "quorum": [], >> > "outside_quorum": [], >> > "extra_probe_peers": [], >> > "sync_provider": [], >> > "monmap": { "epoch": 2, >> > "fsid": "XXX", >> > "modified": "0.000000", >> > "created": "0.000000", >> > "mons": [ >> > { "rank": 0, >> > "name": "ceph-mon-01", >> > "addr": "10.1.1.64:6789\/0"}, >> > { "rank": 1, >> > "name": "ceph-mon-02", >> > "addr": "10.1.1.65:6789\/0"}, >> > { "rank": 2, >> > "name": "ceph-mon-03", >> > "addr": "10.1.1.66:6789\/0"}]}} >> > >> > >> > Any help or advice is appreciated. >> >> It looks like your mon has been unable to communicate with the other >> hosts, presumably since the time you un-/replugged it. Check your >> switch port configuration. Also, make sure that from 10.1.1.66, you >> can not only ping 10.1.1.64 and 10.1.1.65, but make a TCP connection >> on port 6789. With that out of the way, check your mon log on >> ceph-mon-03 (in /var/log/ceph/mon); it should provide some additional >> insight into the problem. >> >> Cheers, >> Florian >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140917/d5242080/attachment-0001.htm>