Like you say, nothing works when you're down to 1 mon (cluster is halted hunting for at least 1 more mon). You'll need to manually fix it or bring the other mons back online. I've had to use this page a few times on my 3mon cluster during testing turning osd/mons/servers on/off: http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster I realize it is maybe not what you want, but at the very least, if you can identify the mon with the latest monmap it is surprisingly easy to get things going again with just 1 monitor (in your case, just the surviving mon). Cheers, Martin On Thu, Feb 14, 2013 at 10:21 AM, Wolfgang Hennerbichler <wolfgang.hennerbichler@xxxxxxxxxxxxxxxx> wrote: > My fellow ceph users, > > I'm trying to figure out some disaster recovery scenarios. > I have a ceph cluster on 2 sites. One site has 2 mons, the other site > has 1 mon. > Let's assume one of the sites loses power, the nodes there go down. > That's OK if it's the site with the 1 mon, but if it's the site with the > 2 mons, nothing will work anymore. That's a feature, I've been told. I'd > love to have some knobs and switches here but it seems there aren't any. > OK. So let's try to recover the site that still has power from that > failure. My strategy: remove the mon's without power: > > # ceph -d mon delete mon.1 > > => ceph just sits there and waits forever. > > damn it. So how DO i recover? Is there a better stragey if I have a > cluster on 2 sites? > > Wolfgang > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com