Hi! Now I have the same situation on al monitors without any reboot: root@bes-mon3:~# ceph --verbose -w Error initializing cluster client: Error root@bes-mon3:~# ceph --admin-daemon /var/run/ceph/ceph-mon.3.asok mon_status { "name": "3", "rank": 2, "state": "peon", "election_epoch": 86, "quorum": [ 0, 1, 2], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 3, "fsid": "fffeafa2-a664-48a7-979a-517e3ffa0da1", "modified": "2014-03-15 11:52:21.182767", "created": "2014-03-15 11:51:42.321256", "mons": [ { "rank": 0, "name": "1", "addr": "10.92.8.80:6789\/0"}, { "rank": 1, "name": "2", "addr": "10.92.8.81:6789\/0"}, { "rank": 2, "name": "3", "addr": "10.92.8.82:6789\/0"}]}} root@bes-mon3:~# ceph --admin-daemon /var/run/ceph/ceph-mon.3.asok quorum_status { "election_epoch": 86, "quorum": [ 0, 1, 2], "quorum_names": [ "1", "2", "3"], "quorum_leader_name": "1", "monmap": { "epoch": 3, "fsid": "fffeafa2-a664-48a7-979a-517e3ffa0da1", "modified": "2014-03-15 11:52:21.182767", "created": "2014-03-15 11:51:42.321256", "mons": [ { "rank": 0, "name": "1", "addr": "10.92.8.80:6789\/0"}, { "rank": 1, "name": "2", "addr": "10.92.8.81:6789\/0"}, { "rank": 2, "name": "3", "addr": "10.92.8.82:6789\/0"}]}} root@bes-mon3:~# ceph --admin-daemon /var/run/ceph/ceph-mon.3.asok version {"version":"0.72.2"} The rbd image mounted from this cluster seems to be ok, reading and writing don't hangs. Pavel. 23 марта 2014 г., в 8:49, Kyle Bader <kyle.bader@xxxxxxxxx> написал(а): >> I have two nodes with 8 OSDs on each. First node running 2 monitors on different virtual machines (mon.1 and mon.2), second node runing mon.3 >> After several reboots (I have tested power failure scenarios) "ceph -w" on node 2 always fails with message: >> >> root@bes-mon3:~# ceph --verbose -w >> Error initializing cluster client: Error > > The cluster is simply protecting itself from a split brain situation. > Say you have: > > mon.1 mon.2 mon.3 > > If mon.1 fails, no big deal, you still have 2/3 so no problem. > > Now instead, say mon.1 is separated from mon.2 and mon.3 because of a > network partition (trunk failure, whatever). If one monitor of the > three could elect itself as leader then you might have divergence > between your monitors. Self-elected mon.1 thinks it's the leader and > mon.{2,3} have elected a leader amongst themselves. The harsh reality > is you really need to have monitors on 3 distinct physical hosts to > protect against the failure of a physical host. > > -- > > Kyle _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com