Re: Fwd: lost power. monitors died. Cephx errors now

Sean Sullivan <seapasulli@xxxxxxxxxxxx> · Mon, 6 Feb 2017 17:37:41 -0600

So the cluster has been dead and down since around 8/10/2016. I have since rebooted the cluster in order to try and use the new ceph-monstore-tool rebuild functionality. 

I built the debian packages for the tools for hammer that were recently backported and installed it across all of the servers:

root@kh08-8:/home/lacadmin# ceph --version
ceph version 0.94.9-4530-g83af8cd (83af8cdaaa6d94404e6146b68e532a784e3cc99c)

From here I ran the following:
------------------------------------------------------------------------------
#!/bin/bash
set -e
store="/home/localadmin/monstore/"

rm -rf "${store}"
mkdir -p "${store}"

for host in kh{08..10}-{1..7};
do
    rsync -Pav ${store} ${host}:${store}
    for osd in $(ssh ${host} 'ls /var/lib/ceph/osd/ | grep ceph-*');
    do
        echo "${disk}"
        ssh ${host} "sudo ceph-objectstore-tool --data-path /var/lib/ceph/osd/${osd} --journal-path /var/lib/ceph/osd/${osd}/journal --op update-mon-db --mon-store-path ${store}"
    done
    ssh ${host} "sudo chown lacadmin. ${store}"
    rsync -Pav ${host}:${store} ${store}
done
------------------------------------------------------------------------------

Which generated a 1.1G store.db directory

From here I ran the following (per the github guide -- https://github.com/ceph/ceph/blob/master/doc/rados/troubleshooting/troubleshooting-mon.rst )

ceph-authtool ./admin.keyring -n mon. --cap mon 'allow *'
ceph-authtool -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'

which gave me the following key :: 
------------------------------------------------------------------------------
[mon.]
	key = AAAAAAAAAAAAAAAA
	caps mon = "allow *"
[client.admin]
	key = AAAAAAAAAAAAAAAA
	caps mds = "allow *"
	caps mon = "allow *"
	caps osd = "allow *"
------------------------------------------------------------------------------

the above looks like it shouldn't work but going with it. I tried using the monstore tool to rebuild based on the monstore grabbed from all 630 of the osds) but I am met with a dump T_T

------------------------------------------------------------------------------
ceph-monstore-tool /home/localadmin/monstore rebuild -- --keyring /home/localadmin/admin.keyring

*** Caught signal (Segmentation fault) **
 in thread 7f10cd6d88c0
 ceph version 0.94.9-4530-g83af8cd (83af8cdaaa6d94404e6146b68e532a784e3cc99c)
 1: ceph-monstore-tool() [0x5e960a]
 2: (()+0x10330) [0x7f10cc5c8330]
 3: (strlen()+0x2a) [0x7f10cac629da]
 4: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)+0x25) [0x7f10cb576d75]
 5: (rebuild_monstore(char const*, std::vector<std::string, std::allocator<std::string> >&, MonitorDBStore&)+0x878) [0x544958]
 6: (main()+0x3e05) [0x52c035]
 7: (__libc_start_main()+0xf5) [0x7f10cabfbf45]
 8: ceph-monstore-tool() [0x540347]
2017-02-06 17:35:59.885651 7f10cd6d88c0 -1 *** Caught signal (Segmentation fault) **
 in thread 7f10cd6d88c0

 ceph version 0.94.9-4530-g83af8cd (83af8cdaaa6d94404e6146b68e532a784e3cc99c)
 1: ceph-monstore-tool() [0x5e960a]
 2: (()+0x10330) [0x7f10cc5c8330]
 3: (strlen()+0x2a) [0x7f10cac629da]
 4: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)+0x25) [0x7f10cb576d75]
 5: (rebuild_monstore(char const*, std::vector<std::string, std::allocator<std::string> >&, MonitorDBStore&)+0x878) [0x544958]
 6: (main()+0x3e05) [0x52c035]
 7: (__libc_start_main()+0xf5) [0x7f10cabfbf45]
 8: ceph-monstore-tool() [0x540347]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -15> 2017-02-06 17:35:54.362066 7f10cd6d88c0  5 asok(0x355a000) register_command perfcounters_dump hook 0x350a0d0
   -14> 2017-02-06 17:35:54.362122 7f10cd6d88c0  5 asok(0x355a000) register_command 1 hook 0x350a0d0
   -13> 2017-02-06 17:35:54.362137 7f10cd6d88c0  5 asok(0x355a000) register_command perf dump hook 0x350a0d0
   -12> 2017-02-06 17:35:54.362147 7f10cd6d88c0  5 asok(0x355a000) register_command perfcounters_schema hook 0x350a0d0
   -11> 2017-02-06 17:35:54.362157 7f10cd6d88c0  5 asok(0x355a000) register_command 2 hook 0x350a0d0
   -10> 2017-02-06 17:35:54.362161 7f10cd6d88c0  5 asok(0x355a000) register_command perf schema hook 0x350a0d0
    -9> 2017-02-06 17:35:54.362170 7f10cd6d88c0  5 asok(0x355a000) register_command perf reset hook 0x350a0d0
    -8> 2017-02-06 17:35:54.362179 7f10cd6d88c0  5 asok(0x355a000) register_command config show hook 0x350a0d0
    -7> 2017-02-06 17:35:54.362188 7f10cd6d88c0  5 asok(0x355a000) register_command config set hook 0x350a0d0
    -6> 2017-02-06 17:35:54.362193 7f10cd6d88c0  5 asok(0x355a000) register_command config get hook 0x350a0d0
    -5> 2017-02-06 17:35:54.362202 7f10cd6d88c0  5 asok(0x355a000) register_command config diff hook 0x350a0d0
    -4> 2017-02-06 17:35:54.362207 7f10cd6d88c0  5 asok(0x355a000) register_command log flush hook 0x350a0d0
    -3> 2017-02-06 17:35:54.362215 7f10cd6d88c0  5 asok(0x355a000) register_command log dump hook 0x350a0d0
    -2> 2017-02-06 17:35:54.362220 7f10cd6d88c0  5 asok(0x355a000) register_command log reopen hook 0x350a0d0
    -1> 2017-02-06 17:35:54.379684 7f10cd6d88c0  2 auth: KeyRing::load: loaded key file /home/lacadmin/admin.keyring
     0> 2017-02-06 17:35:59.885651 7f10cd6d88c0 -1 *** Caught signal (Segmentation fault) **
 in thread 7f10cd6d88c0

 ceph version 0.94.9-4530-g83af8cd (83af8cdaaa6d94404e6146b68e532a784e3cc99c)
 1: ceph-monstore-tool() [0x5e960a]
 2: (()+0x10330) [0x7f10cc5c8330]
 3: (strlen()+0x2a) [0x7f10cac629da]
 4: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)+0x25) [0x7f10cb576d75]
 5: (rebuild_monstore(char const*, std::vector<std::string, std::allocator<std::string> >&, MonitorDBStore&)+0x878) [0x544958]
 6: (main()+0x3e05) [0x52c035]
 7: (__libc_start_main()+0xf5) [0x7f10cabfbf45]
 8: ceph-monstore-tool() [0x540347]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   1/ 1 ms
  10/10 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent       500
  max_new         1000
  log_file 
--- end dump of recent events ---
Segmentation fault (core dumped)

------------------------------------------------------------------------------

I have tried copying my monitor and admin keyring into the admin.keyring used to try to rebuild and it still fails. I am not sure whether this is due to my packages or if something else is wrong. Is there a way to test or see what may be happening? 

On Sat, Aug 13, 2016 at 10:36 PM, Sean Sullivan <seapasulli@xxxxxxxxxxxx> wrote:
So with a patched leveldb to skip errors I now have a store.db that I can extract the pg,mon,and osd map from. That said when I try to start kh10-8 it bombs out::

------------------------------------------------------------------------------
root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8# ceph-mon -i $(hostname) -d
2016-08-13 22:30:54.596039 7fa8b9e088c0  0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 708653
starting mon.kh10-8 rank 2 at 10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid e452874b-cb29-4468-ac7f-f8901dfccebf
2016-08-13 22:30:54.608150 7fa8b9e088c0  0 starting mon.kh10-8 rank 2 at 10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid e452874b-cb29-4468-ac7f-f8901dfccebf
2016-08-13 22:30:54.608395 7fa8b9e088c0  1 mon.kh10-8@-1(probing) e1 preinit fsid e452874b-cb29-4468-ac7f-f8901dfccebf
2016-08-13 22:30:54.608617 7fa8b9e088c0  1 mon.kh10-8@-1(probing).paxosservice(pgmap 0..35606392) refresh upgraded, format 0 -> 1
2016-08-13 22:30:54.608629 7fa8b9e088c0  1 mon.kh10-8@-1(probing).pg v0 on_upgrade discarding in-core PGMap
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
  what():  buffer::end_of_buffer
*** Caught signal (Aborted) **
 in thread 7fa8b9e088c0
 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: ceph-mon() [0x9b25ea]
 2: (()+0x10330) [0x7fa8b8f0b330]
 3: (gsignal()+0x37) [0x7fa8b73a8c37]
 4: (abort()+0x148) [0x7fa8b73ac028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535]
 6: (()+0x5e6d6) [0x7fa8b7cb16d6]
 7: (()+0x5e703) [0x7fa8b7cb1703]
 8: (()+0x5e922) [0x7fa8b7cb1922]
 9: ceph-mon() [0x853c39]
 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167) [0x894227]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3]
 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8]
 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7]
 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a]
 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb]
 17: (Monitor::init_paxos()+0x85) [0x5b2365]
 18: (Monitor::preinit()+0x7d7) [0x5b6f87]
 19: (main()+0x230c) [0x57853c]
 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45]
 21: ceph-mon() [0x59a3c7]
2016-08-13 22:30:54.611791 7fa8b9e088c0 -1 *** Caught signal (Aborted) **
 in thread 7fa8b9e088c0

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: ceph-mon() [0x9b25ea]
 2: (()+0x10330) [0x7fa8b8f0b330]
 3: (gsignal()+0x37) [0x7fa8b73a8c37]
 4: (abort()+0x148) [0x7fa8b73ac028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535]
 6: (()+0x5e6d6) [0x7fa8b7cb16d6]
 7: (()+0x5e703) [0x7fa8b7cb1703]
 8: (()+0x5e922) [0x7fa8b7cb1922]
 9: ceph-mon() [0x853c39]
 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167) [0x894227]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3]
 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8]
 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7]
 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a]
 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb]
 17: (Monitor::init_paxos()+0x85) [0x5b2365]
 18: (Monitor::preinit()+0x7d7) [0x5b6f87]
 19: (main()+0x230c) [0x57853c]
 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45]
 21: ceph-mon() [0x59a3c7]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -33> 2016-08-13 22:30:54.593450 7fa8b9e088c0  5 asok(0x36a20f0) register_command perfcounters_dump hook 0x365a050
   -32> 2016-08-13 22:30:54.593480 7fa8b9e088c0  5 asok(0x36a20f0) register_command 1 hook 0x365a050
   -31> 2016-08-13 22:30:54.593486 7fa8b9e088c0  5 asok(0x36a20f0) register_command perf dump hook 0x365a050
   -30> 2016-08-13 22:30:54.593496 7fa8b9e088c0  5 asok(0x36a20f0) register_command perfcounters_schema hook 0x365a050
   -29> 2016-08-13 22:30:54.593499 7fa8b9e088c0  5 asok(0x36a20f0) register_command 2 hook 0x365a050
   -28> 2016-08-13 22:30:54.593501 7fa8b9e088c0  5 asok(0x36a20f0) register_command perf schema hook 0x365a050
   -27> 2016-08-13 22:30:54.593503 7fa8b9e088c0  5 asok(0x36a20f0) register_command perf reset hook 0x365a050
   -26> 2016-08-13 22:30:54.593505 7fa8b9e088c0  5 asok(0x36a20f0) register_command config show hook 0x365a050
   -25> 2016-08-13 22:30:54.593508 7fa8b9e088c0  5 asok(0x36a20f0) register_command config set hook 0x365a050
   -24> 2016-08-13 22:30:54.593510 7fa8b9e088c0  5 asok(0x36a20f0) register_command config get hook 0x365a050
   -23> 2016-08-13 22:30:54.593512 7fa8b9e088c0  5 asok(0x36a20f0) register_command config diff hook 0x365a050
   -22> 2016-08-13 22:30:54.593513 7fa8b9e088c0  5 asok(0x36a20f0) register_command log flush hook 0x365a050
   -21> 2016-08-13 22:30:54.593557 7fa8b9e088c0  5 asok(0x36a20f0) register_command log dump hook 0x365a050
   -20> 2016-08-13 22:30:54.593561 7fa8b9e088c0  5 asok(0x36a20f0) register_command log reopen hook 0x365a050
   -19> 2016-08-13 22:30:54.596039 7fa8b9e088c0  0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 708653
   -18> 2016-08-13 22:30:54.597587 7fa8b9e088c0  5 asok(0x36a20f0) init /var/run/ceph/ceph-mon.kh10-8.asok
   -17> 2016-08-13 22:30:54.597601 7fa8b9e088c0  5 asok(0x36a20f0) bind_and_listen /var/run/ceph/ceph-mon.kh10-8.asok
   -16> 2016-08-13 22:30:54.597767 7fa8b9e088c0  5 asok(0x36a20f0) register_command 0 hook 0x36560c0
   -15> 2016-08-13 22:30:54.597775 7fa8b9e088c0  5 asok(0x36a20f0) register_command version hook 0x36560c0
   -14> 2016-08-13 22:30:54.597778 7fa8b9e088c0  5 asok(0x36a20f0) register_command git_version hook 0x36560c0
   -13> 2016-08-13 22:30:54.597781 7fa8b9e088c0  5 asok(0x36a20f0) register_command help hook 0x365a150
   -12> 2016-08-13 22:30:54.597783 7fa8b9e088c0  5 asok(0x36a20f0) register_command get_command_descriptions hook 0x365a140
   -11> 2016-08-13 22:30:54.597860 7fa8b5181700  5 asok(0x36a20f0) entry start
   -10> 2016-08-13 22:30:54.608150 7fa8b9e088c0  0 starting mon.kh10-8 rank 2 at 10.64.64.125:6789/0 mon_data /var/lib/ceph/mon/ceph-kh10-8 fsid e452874b-cb29-4468-ac7f-f8901dfccebf
    -9> 2016-08-13 22:30:54.608210 7fa8b9e088c0  1 -- 10.64.64.125:6789/0 learned my addr 10.64.64.125:6789/0
    -8> 2016-08-13 22:30:54.608214 7fa8b9e088c0  1 accepter.accepter.bind my_inst.addr is 10.64.64.125:6789/0 need_addr=0
    -7> 2016-08-13 22:30:54.608279 7fa8b9e088c0  5 adding auth protocol: cephx
    -6> 2016-08-13 22:30:54.608282 7fa8b9e088c0  5 adding auth protocol: cephx
    -5> 2016-08-13 22:30:54.608311 7fa8b9e088c0 10 log_channel(cluster) update_config to_monitors: true to_syslog: false syslog_facility: daemon prio: info)
    -4> 2016-08-13 22:30:54.608317 7fa8b9e088c0 10 log_channel(audit) update_config to_monitors: true to_syslog: false syslog_facility: local0 prio: info)
    -3> 2016-08-13 22:30:54.608395 7fa8b9e088c0  1 mon.kh10-8@-1(probing) e1 preinit fsid e452874b-cb29-4468-ac7f-f8901dfccebf
    -2> 2016-08-13 22:30:54.608617 7fa8b9e088c0  1 mon.kh10-8@-1(probing).paxosservice(pgmap 0..35606392) refresh upgraded, format 0 -> 1
    -1> 2016-08-13 22:30:54.608629 7fa8b9e088c0  1 mon.kh10-8@-1(probing).pg v0 on_upgrade discarding in-core PGMap
     0> 2016-08-13 22:30:54.611791 7fa8b9e088c0 -1 *** Caught signal (Aborted) **
 in thread 7fa8b9e088c0

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: ceph-mon() [0x9b25ea]
 2: (()+0x10330) [0x7fa8b8f0b330]
 3: (gsignal()+0x37) [0x7fa8b73a8c37]
 4: (abort()+0x148) [0x7fa8b73ac028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fa8b7cb3535]
 6: (()+0x5e6d6) [0x7fa8b7cb16d6]
 7: (()+0x5e703) [0x7fa8b7cb1703]
 8: (()+0x5e922) [0x7fa8b7cb1922]
 9: ceph-mon() [0x853c39]
 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x167) [0x894227]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x5ff) [0x894baf]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xa3) [0x91a8d3]
 13: (PGMonitor::read_pgmap_full()+0x1d8) [0x68b9b8]
 14: (PGMonitor::update_from_paxos(bool*)+0xbf7) [0x6977b7]
 15: (PaxosService::refresh(bool*)+0x19a) [0x605b5a]
 16: (Monitor::refresh_from_paxos(bool*)+0x1db) [0x5b1ffb]
 17: (Monitor::init_paxos()+0x85) [0x5b2365]
 18: (Monitor::preinit()+0x7d7) [0x5b6f87]
 19: (main()+0x230c) [0x57853c]
 20: (__libc_start_main()+0xf5) [0x7fa8b7393f45]
 21: ceph-mon() [0x59a3c7]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file 
--- end dump of recent events ---
Aborted (core dumped)
---------------------------------------
---------------------------------------

I feel like I am so close but so far. Can anyone give me a nudge as to what I can do next? it looks like it is bombing out on trying to get an updated paxos. 

On Fri, Aug 12, 2016 at 1:09 PM, Sean Sullivan <seapasulli@xxxxxxxxxxxx> wrote:
A coworker patched leveldb and we were able to export quite a bit of data from kh08's leveldb database. At this point I think I need to re-construct a new leveldb with whatever values I can. Is it the same leveldb database across all 3 montiors? IE will keys exported from one work in the other? All should have the same keys/values although constructed differently right? I can't blindly copy /var/lib/ceph/mon/ceph-$(hostname)/store.db/ from one host to another right? But can I copy the keys/values from one to another? 

On Fri, Aug 12, 2016 at 12:45 PM, Sean Sullivan <seapasulli@xxxxxxxxxxxx> wrote:
ceph-monstore-tool? Is that the same as monmaptool? oops! NM found it in ceph-test package::

I can't seem to get it working :-( dump monmap or any of the commands. They all bomb out with the same message:
root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool /var/lib/ceph/mon/ceph-kh10-8 dump-trace -- /tmp/test.trace
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/store.db/10882319.ldb
root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# ceph-monstore-tool /var/lib/ceph/mon/ceph-kh10-8 dump-keys
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/store.db/10882319.ldb

I need to clarify as I originally had 2 clusters with this issue and now I have 1 with all 3 monitors dead and 1 that I was successfully able to repair. I am about to recap everything I know about the issue and the issue at hand. Should I start a new email thread about this instead?

The cluster that is currently having issues is on hammer (94.7), and the monitor stats are the same::
root@kh08-8:~# cat /proc/cpuinfo | grep -iE "model name" | uniq -c
     24 model name	: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
     ext4 volume comprised of 4x300GB 10k drives in raid 10.
     ubuntu 14.04

root@kh08-8:~# uname -a
Linux kh08-8 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@kh08-8:~# ceph --version
ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

From here: Here are the errors I am getting when starting each of the monitors::

---------------
root@kh08-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh08-8 -d
2016-08-11 22:15:23.731550 7fe5ad3e98c0  0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 317309
Corruption: error in middle of record
2016-08-11 22:15:28.274340 7fe5ad3e98c0 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-kh08-8': (22) Invalid argument
--
root@kh09-8:~# /usr/bin/ceph-mon --cluster=ceph -i kh09-8 -d
2016-08-11 22:14:28.252370 7f7eaab908c0  0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 308888
Corruption: 14 missing files; e.g.: /var/lib/ceph/mon/ceph-kh09-8/store.db/10845998.ldb
2016-08-11 22:14:35.094237 7f7eaab908c0 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-kh09-8': (22) Invalid argument
--
root@kh10-8:/var/lib/ceph/mon/ceph-kh10-8/store.db# /usr/bin/ceph-mon --cluster=ceph -i kh10-8 -d
2016-08-11 22:17:54.632762 7f80bf34d8c0  0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mon, pid 292620
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/ceph-kh10-8/store.db/10882319.ldb
2016-08-11 22:18:01.207749 7f80bf34d8c0 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-kh10-8': (22) Invalid argument
---------------

for kh08, a coworker patched leveldb to print and skip on the first error and that one is also missing a bunch of files. As such I think kh10-8 is my most likely candidate to recover but either way recovery is probably not an option. I see leveldb has a repair.cc (https://github.com/google/leveldb/blob/master/db/repair.cc)) but I do not see repair mentioned in monitor in respect to the dbstore. I tried using the leveldb python module (plyvel) to attempt a repair but my repl just ends up dying. 

I understand two things:: 1.) Without rebuilding the monitor backend leveldb (the cluster map as I understand it) store all of the data in the cluster is essentialy lost (right?)
                                         2.) it is possible to rebuild this database via some form of magic or (source)ry as all of this data is essential held throughout the cluster as well.

We only use radosgw / S3 for this cluster. If there is a way to recover my data that is easier//more likely than rebuilding the leveldb of a monitor and starting a single monitor cluster up I would like to switch gears and focus on that. 

Looking at the dev docs:
http://docs.ceph.com/docs/hammer/architecture/#cluster-map
it has 5 main parts::

```
The Monitor Map: Contains the cluster fsid, the position, name address and port of each monitor. It also indicates the current epoch, when the map was created, and the last time it changed. To view a monitor map, execute ceph mon dump.
The OSD Map: Contains the cluster fsid, when the map was created and last modified, a list of pools, replica sizes, PG numbers, a list of OSDs and their status (e.g., up, in). To view an OSD map, execute ceph osd dump.
The PG Map: Contains the PG version, its time stamp, the last OSD map epoch, the full ratios, and details on each placement group such as the PG ID, the Up Set, the Acting Set, the state of the PG (e.g., active + clean), and data usage statistics for each pool.
The CRUSH Map: Contains a list of storage devices, the failure domain hierarchy (e.g., device, host, rack, row, room, etc.), and rules for traversing the hierarchy when storing data. To view a CRUSH map, execute ceph osd getcrushmap -o {filename}; then, decompile it by executing crushtool -d {comp-crushmap-filename} -o {decomp-crushmap-filename}. You can view the decompiled map in a text editor or with cat.
The MDS Map: Contains the current MDS map epoch, when the map was created, and the last time it changed. It also contains the pool for storing metadata, a list of metadata servers, and which metadata servers are up and in. To view an MDS map, execute ceph mds dump.
```

As we don't use cephfs mds can essentially be blank(right) so I am left with 4 valid maps needed to get a working cluster again. I don't see auth mentioned in there but that too.  Then I just need to rebuild the leveldb database somehow with the right information and I should be good. So long long long journey ahead.  

I don't think that the data is stored in strings or json, right? Am I going down the wrong path here? Is there a shorter/simpler path to retrieve the data from a cluster that lost all 3 monitors in power falure? If I am going down the right path is there any advice on how I can assemble/repair the database?

I see that there is a rbd recovery from a dead cluster tool. Is it possible to do the same with s3 objects?

On Thu, Aug 11, 2016 at 11:15 AM, Wido den Hollander <wido@xxxxxxxx> wrote:

> Op 11 augustus 2016 om 15:17 schreef Sean Sullivan <seapasulli@xxxxxxxxxxxx>:

>

>

> Hello Wido,

>

> Thanks for the advice.  While the data center has a/b circuits and

> redundant power, etc if a ground fault happens it  travels outside and

> fails causing the whole building to fail (apparently).

>

> The monitors are each the same with

> 2x e5 cpus

> 64gb of ram

> 4x 300gb 10k SAS drives in raid 10 (write through mode).

> Ubuntu 14.04 with the latest updates prior to power failure (2016/Aug/10 -

> 3am CST)

> Ceph hammer LTS 0.94.7

>

> (we are still working on our jewel test cluster so it is planned but not in

> place yet)

>

> The only thing that seems to be corrupt is the monitors leveldb store.  I

> see multiple issues on Google leveldb github from March 2016 about fsync

> and power failure so I assume this is an issue with leveldb.

>

> I have backed up /var/lib/ceph/Mon on all of my monitors before trying to

> proceed with any form of recovery.

>

> Is there any way to reconstruct the leveldb or replace the monitors and

> recover the data?

>

I don't know. I have never done it. Other people might know this better than me.

Maybe 'ceph-monstore-tool' can help you?

Wido

> I found the following post in which sage says it is tedious but possible. (

> http://www.spinics.net/lists/ceph-devel/msg06662.html). Tedious is fine if

> I have any chance of doing it.  I have the fsid, the Mon key map and all of

> the osds look to be fine so all of the previous osd maps  are there.

>

> I just don't understand what key/values I need inside.

>

> On Aug 11, 2016 1:33 AM, "Wido den Hollander" <wido@xxxxxxxx> wrote:

>

> >

> > > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan <

> > seapasulli@xxxxxxxxxxxx>:

> > >

> > >

> > > I think it just got worse::

> > >

> > > all three monitors on my other cluster say that ceph-mon can't open

> > > /var/lib/ceph/mon/$(hostname). Is there any way to recover if you lose

> > all

> > > 3 monitors? I saw a post by Sage saying that the data can be recovered as

> > > all of the data is held on other servers. Is this possible? If so has

> > > anyone had any experience doing so?

> >

> > I have never done so, so I couldn't tell you.

> >

> > However, it is weird that on all three it got corrupted. What hardware are

> > you using? Was it properly protected against power failure?

> >

> > If you mon store is corrupted I'm not sure what might happen.

> >

> > However, make a backup of ALL monitors right now before doing anything.

> >

> > Wido

> >

> > > _______________________________________________

> > > ceph-users mailing list

> > > ceph-users@xxxxxxxxxxxxxx

> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> >

-- 
- Sean:  I wrote this. - 

-- 
- Sean:  I wrote this. - 

-- 
- Sean:  I wrote this. - 

-- 
- Sean:  I wrote this. - 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com