Hi, I have a ceph cluster running 0.80.1 on Ubuntu 14.04. I have 3 monitors and 4 OSD nodes currently. Everything has been running great up until today where I've got an issue with the monitors. I moved mon03 to a different switchport so it would have temporarily lost connectivity. Since then, the cluster is reporting that that mon is down, although it's definitely up. I've tried restarting the mon services on all three mons, but that hasn't made a difference. I definitely, 100% do not have any clock skew on any of the mons. This has been triple-checked as the ceph docs seem to suggest that might be the cause of this issue. Here is what ceph -s and ceph health detail are reporting as well as the mon_status for each monitor: # ceph -s ; ceph health detail cluster XXX health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02 monmap e2: 3 mons at {ceph-mon-01= 10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0}, election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02 osdmap e49213: 80 osds: 80 up, 80 in pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects 197 TB used, 95904 GB / 290 TB avail 8 active+clean+scrubbing+deep 4856 active+clean client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02 mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum) { "name": "ceph-mon-01", "rank": 0, "state": "leader", "election_epoch": 932, "quorum": [ 0, 1], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 2, "fsid": "XXX", "modified": "0.000000", "created": "0.000000", "mons": [ { "rank": 0, "name": "ceph-mon-01", "addr": "10.1.1.64:6789\/0"}, { "rank": 1, "name": "ceph-mon-02", "addr": "10.1.1.65:6789\/0"}, { "rank": 2, "name": "ceph-mon-03", "addr": "10.1.1.66:6789\/0"}]}} { "name": "ceph-mon-02", "rank": 1, "state": "peon", "election_epoch": 932, "quorum": [ 0, 1], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 2, "fsid": "XXX", "modified": "0.000000", "created": "0.000000", "mons": [ { "rank": 0, "name": "ceph-mon-01", "addr": "10.1.1.64:6789\/0"}, { "rank": 1, "name": "ceph-mon-02", "addr": "10.1.1.65:6789\/0"}, { "rank": 2, "name": "ceph-mon-03", "addr": "10.1.1.66:6789\/0"}]}} { "name": "ceph-mon-03", "rank": 2, "state": "electing", "election_epoch": 931, "quorum": [], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 2, "fsid": "XXX", "modified": "0.000000", "created": "0.000000", "mons": [ { "rank": 0, "name": "ceph-mon-01", "addr": "10.1.1.64:6789\/0"}, { "rank": 1, "name": "ceph-mon-02", "addr": "10.1.1.65:6789\/0"}, { "rank": 2, "name": "ceph-mon-03", "addr": "10.1.1.66:6789\/0"}]}} Any help or advice is appreciated. Regards James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140917/b93396b3/attachment.htm>