Hi, Thanks for the advice. I feel pretty dumb as it does indeed look like a simple networking issue. You know how you check things 5 times and miss the most obvious one... J On 17 September 2014 16:04, Florian Haas <florian at hastexo.com> wrote: > On Wed, Sep 17, 2014 at 1:58 PM, James Eckersall > <james.eckersall at gmail.com> wrote: > > Hi, > > > > I have a ceph cluster running 0.80.1 on Ubuntu 14.04. I have 3 monitors > and > > 4 OSD nodes currently. > > > > Everything has been running great up until today where I've got an issue > > with the monitors. > > I moved mon03 to a different switchport so it would have temporarily lost > > connectivity. > > Since then, the cluster is reporting that that mon is down, although it's > > definitely up. > > I've tried restarting the mon services on all three mons, but that hasn't > > made a difference. > > I definitely, 100% do not have any clock skew on any of the mons. This > has > > been triple-checked as the ceph docs seem to suggest that might be the > cause > > of this issue. > > > > Here is what ceph -s and ceph health detail are reporting as well as the > > mon_status for each monitor: > > > > > > # ceph -s ; ceph health detail > > cluster XXX > > health HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02 > > monmap e2: 3 mons at > > {ceph-mon-01= > 10.1.1.64:6789/0,ceph-mon-02=10.1.1.65:6789/0,ceph-mon-03=10.1.1.66:6789/0 > }, > > election epoch 932, quorum 0,1 ceph-mon-01,ceph-mon-02 > > osdmap e49213: 80 osds: 80 up, 80 in > > pgmap v18242952: 4864 pgs, 5 pools, 69910 GB data, 17638 kobjects > > 197 TB used, 95904 GB / 290 TB avail > > 8 active+clean+scrubbing+deep > > 4856 active+clean > > client io 6893 kB/s rd, 5657 kB/s wr, 2090 op/s > > HEALTH_WARN 1 mons down, quorum 0,1 ceph-mon-01,ceph-mon-02 > > mon.ceph-mon-03 (rank 2) addr 10.1.1.66:6789/0 is down (out of quorum) > > > > > > { "name": "ceph-mon-01", > > "rank": 0, > > "state": "leader", > > "election_epoch": 932, > > "quorum": [ > > 0, > > 1], > > "outside_quorum": [], > > "extra_probe_peers": [], > > "sync_provider": [], > > "monmap": { "epoch": 2, > > "fsid": "XXX", > > "modified": "0.000000", > > "created": "0.000000", > > "mons": [ > > { "rank": 0, > > "name": "ceph-mon-01", > > "addr": "10.1.1.64:6789\/0"}, > > { "rank": 1, > > "name": "ceph-mon-02", > > "addr": "10.1.1.65:6789\/0"}, > > { "rank": 2, > > "name": "ceph-mon-03", > > "addr": "10.1.1.66:6789\/0"}]}} > > > > > > { "name": "ceph-mon-02", > > "rank": 1, > > "state": "peon", > > "election_epoch": 932, > > "quorum": [ > > 0, > > 1], > > "outside_quorum": [], > > "extra_probe_peers": [], > > "sync_provider": [], > > "monmap": { "epoch": 2, > > "fsid": "XXX", > > "modified": "0.000000", > > "created": "0.000000", > > "mons": [ > > { "rank": 0, > > "name": "ceph-mon-01", > > "addr": "10.1.1.64:6789\/0"}, > > { "rank": 1, > > "name": "ceph-mon-02", > > "addr": "10.1.1.65:6789\/0"}, > > { "rank": 2, > > "name": "ceph-mon-03", > > "addr": "10.1.1.66:6789\/0"}]}} > > > > > > { "name": "ceph-mon-03", > > "rank": 2, > > "state": "electing", > > "election_epoch": 931, > > "quorum": [], > > "outside_quorum": [], > > "extra_probe_peers": [], > > "sync_provider": [], > > "monmap": { "epoch": 2, > > "fsid": "XXX", > > "modified": "0.000000", > > "created": "0.000000", > > "mons": [ > > { "rank": 0, > > "name": "ceph-mon-01", > > "addr": "10.1.1.64:6789\/0"}, > > { "rank": 1, > > "name": "ceph-mon-02", > > "addr": "10.1.1.65:6789\/0"}, > > { "rank": 2, > > "name": "ceph-mon-03", > > "addr": "10.1.1.66:6789\/0"}]}} > > > > > > Any help or advice is appreciated. > > It looks like your mon has been unable to communicate with the other > hosts, presumably since the time you un-/replugged it. Check your > switch port configuration. Also, make sure that from 10.1.1.66, you > can not only ping 10.1.1.64 and 10.1.1.65, but make a TCP connection > on port 6789. With that out of the way, check your mon log on > ceph-mon-03 (in /var/log/ceph/mon); it should provide some additional > insight into the problem. > > Cheers, > Florian > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140917/a169f9d1/attachment-0001.htm>