ceph-monstore-tool rebuild assert error

Sean Sullivan <seapasulli@xxxxxxxxxxxx> · Tue, 7 Feb 2017 15:41:20 -0600

I have a hammer cluster that died a bit ago (hammer 94.9) consisting of 3 monitors and 630 osds spread across 21 storage hosts. The clusters monitors all died due to leveldb corruption and the cluster was shut down. I was finally given word that I could try to revive the cluster this week! 

https://github.com/ceph/ceph/blob/hammer/doc/rados/troubleshooting/troubleshooting-mon.rst#recovery-using-osds

I see that the latest hammer code in github has the ceph-monstore-tool rebuild backport and that is what I am running on the cluster now (ceph version 0.94.9-4530-g83af8cd (83af8cdaaa6d94404e6146b68e532a784e3cc99c). I was able to scrape all 630 of the osds and am left with a 1.1G store.db directory. Using python I was successfully able to list all of the keys and values which was very promising. That said I can not run the final command in the recovery-using-osds article (ceph-monstore-tool rebuild) successfully.

Whenever I run the tool (with the newly created admin keyring or with my existing one) it errors with the following:

     0> 2017-02-17 15:00:47.516901 7f8b4d7408c0 -1 ./mon/MonitorDBStore.h: In function 'KeyValueDB::Iterator MonitorDBStore::get_iterator(const string&)' thread 7f8b4d7408c0 time 2017-02-07 15:00:47.516319

The complete trace is here
http://pastebin.com/NQE8uYiG

Can anyone lend a hand and tell me what may be wrong? I am able to iterate over the leveldb database in python so the structure should be somewhat okay? Am I SOL at this point? The cluster isn't production any longer and while I don't have months of time I would really like to recover this cluster just to see if it is at all possible. 
-- 
- Sean:  I wrote this. - 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com