One of our tests last night failed in a weird way. We started with a three node cluster, with three monitors, expanded to a 5 node cluster with 5 monitors and dropped back to a 4 node cluster with three monitors. The sequence of events was: start 3 monitors (monitors 0, 1, 2) - monmap e1 add one node restart the 3 monitors add another node add monitor 4 - monmap e2 restart monitor 0 add monitor 3 - monmap e3 restart monitor 1 restart monitor 2 shutdown server with monitor 4 on it remove monitor 4 - monmap e4 restart monitor 0 mon.0 had an odd time sync problem and respawned stop monitor 3 remove monitor 3 At that point (08:23:52 in the log), ceph stopped responding (as if quorum was lost). Note that we do not see a new monmap (e5) created by the removal of monitor 3. See the (sort of) full log at: https://gist.github.com/mdegerne/06fa38243bd462c46d39 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com