Mon won't start, possibly due to corrupt disk?

lincolnb@xxxxxxxxxxxx (Lincoln Bryant) · Thu, 17 Jul 2014 15:39:10 -0500

Hi all,

I tried restarting my mon today, but I find that it no longer starts. Whenever I try to fire up the mon, I get errors of this nature:

   -3> 2014-07-17 15:12:32.738510 7f25b0921780 10 mon.a at -1(probing).auth v1537 update_from_paxos
   -2> 2014-07-17 15:12:32.738526 7f25b0921780 10 mon.a at -1(probing).auth v1537 update_from_paxos version 1537 keys ver 0 latest 0
   -1> 2014-07-17 15:12:32.738532 7f25b0921780 10 mon.a at -1(probing).auth v1537 update_from_paxos key server version 0
    0> 2014-07-17 15:12:32.739836 7f25b0921780 -1 mon/AuthMonitor.cc: In function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread 7f25b0921780 time 2014-07-17 15:12:32.738549
mon/AuthMonitor.cc: 155: FAILED assert(ret == 0)

After having a conversation with Greg in IRC, it seems that the disk state is corrupted. This seems to be CephX related, although we do not have CephX enabled on this cluster. 

At Greg's request, I've attached the logs in this mail to hopefully squirrel out what exactly is corrupted. I've set debug {mon,paxos,auth,keyvaluestore} to 20 in ceph.conf.

I'm hoping to be able to recover -- unfortunately we've made the mistake of only deploying a single mon for this cluster, and there is some data I'd like to preserve.

Thanks for any help,
Lincoln Bryant

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ceph-mon.a.log
Type: application/octet-stream
Size: 626179 bytes
Desc: not available
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140717/70754e2c/attachment.obj>