Can you get the quorum and related dumps out of the admin socket for each running monitor and see what they say? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Jul 23, 2013 at 4:51 PM, Mandell Degerness <mandell@xxxxxxxxxxxxxxx> wrote: > One of our tests last night failed in a weird way. We started with a > three node cluster, with three monitors, expanded to a 5 node cluster > with 5 monitors and dropped back to a 4 node cluster with three > monitors. > > The sequence of events was: > > start 3 monitors (monitors 0, 1, 2) - monmap e1 > add one node > restart the 3 monitors > add another node > add monitor 4 - monmap e2 > restart monitor 0 > add monitor 3 - monmap e3 > restart monitor 1 > restart monitor 2 > shutdown server with monitor 4 on it > remove monitor 4 - monmap e4 > restart monitor 0 > mon.0 had an odd time sync problem and respawned > stop monitor 3 > remove monitor 3 > > At that point (08:23:52 in the log), ceph stopped responding (as if > quorum was lost). Note that we do not see a new monmap (e5) created > by the removal of monitor 3. > > See the (sort of) full log at: > https://gist.github.com/mdegerne/06fa38243bd462c46d39 > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com