Hi, I was handed a Ceph cluster that had just lost quorum due to 2/3 mons (b,c) running out of disk space (using up 15GB each). We were trying to rescue this cluster without service downtime. As such we freed up some space to keep mon b running a while longer, which succeeded, quorum restored (a,b), mon c remained offline. Even though we have freed up some space on mon c's disk also, that mon just won't start. It's log file does say ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process ceph-mon, pid 27846 and thats all she wrote. Even when starting ceph-mon with -d mind you. So we had a cluster with 2/3 mons up and wanted to add another mon since it was only a matter of time til mon b failed again due to disk space. As such I added mon.g to the cluster, which took a long while to sync, but now reports running. Then mon.h got added for the same reason. mon.h fails to start much the same as mon.c does. Still that should leave us with 3/5 mons up. However running "ceph daemon mon.{g,h} mon_status" on the respective node also blocks. The only output we get from those are fault messages. Ok so now mon.g apparantly crashed: 2014-02-28 00:11:48.861263 7f4728042700 -1 mon/Monitor.cc: In function 'void Monitor::sync_timeout(entity_inst_t&)' thread 7f4728042700 time 2014-02-28 00:11:48.782305 mon/Monitor.cc: 1099: FAILED assert(sync_state == SYNC_STATE_CHUNKS) ... and now blocks trying to start much like c and h. Long story short: is it possible to add .61.9 mons to a cluster running .61.2 on the 2 alive mons and all the osds? I'm guessing this is the last shot at trying to rescue the cluster without downtime. KR, Marc _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com