MON crashing when upgrading from Hammer to Luminous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Everyone,

First of all, guys, seriously, Thank you for Ceph.

now to the problem, upgrading ceph from 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403) to 12.2.12-218-g9fd889f (9fd889fe09c652512ca78854702d5ad9bf3059bb), ceph-mon seems unable to upgrade it's database, problem is gone if i --force-sync.

This is the message:
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: void object_stat_sum_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
*** Caught signal (Aborted) **

attached is full log, the output of:
ceph-mon --debug_mon 100 -i node-1 -d

---
Armin ranjbar

2019-07-22 19:17:54.429120 7f2064488f40  0 ceph version 12.2.12-218-g9fd889f (9fd889fe09c652512ca78854702d5ad9bf3059bb) luminous (stable), process ceph-mon, pid 908122
2019-07-22 19:17:54.429229 7f2064488f40  0 pidfile_write: ignore empty --pid-file
2019-07-22 19:17:54.438472 7f2064488f40  0 load: jerasure load: lrc load: isa 
2019-07-22 19:17:54.438908 7f2064488f40  1 leveldb: Recovering log #4402802
2019-07-22 19:17:54.489204 7f2064488f40  1 leveldb: Delete type=0 #4402802

2019-07-22 19:17:54.489263 7f2064488f40  1 leveldb: Delete type=3 #4402801

2019-07-22 19:17:54.489547 7f2064488f40 10 obtain_monmap
terminate called after throwing an instance of 'ceph::buffer::malformed_input'
  what():  buffer::malformed_input: void object_stat_sum_t::decode(ceph::buffer::list::iterator&) decode past end of struct encoding
*** Caught signal (Aborted) **
 in thread 7f2064488f40 thread_name:ceph-mon
2019-07-22 19:17:54.489654 7f2064488f40 10 obtain_monmap read last committed monmap ver 3
2019-07-22 19:17:54.490558 7f2064488f40  0 starting mon.node-1 rank 2 at public addr 192.168.1.16:6789/0 at bind addr 192.168.1.16:6789/0 mon_data /var/lib/ceph/mon/ceph-node-1 fsid cf635990-70fa-43ed-978d-96f92f9ccc92
2019-07-22 19:17:54.490737 7f2064488f40  0 starting mon.node-1 rank 2 at 192.168.1.16:6789/0 mon_data /var/lib/ceph/mon/ceph-node-1 fsid cf635990-70fa-43ed-978d-96f92f9ccc92
2019-07-22 19:17:54.491279 7f2064488f40  1 mon.node-1@-1(probing) e3 preinit fsid cf635990-70fa-43ed-978d-96f92f9ccc92
2019-07-22 19:17:54.491351 7f2064488f40 10 mon.node-1@-1(probing) e3 check_fsid cluster_uuid contains 'cf635990-70fa-43ed-978d-96f92f9ccc92'
2019-07-22 19:17:54.491363 7f2064488f40 10 mon.node-1@-1(probing) e3 features compat={},rocompat={},incompat={1=initial feature set (~v.18),3=single paxos with k/v store (v0.?),4=support erasure code pools,5=new-style osdmap encoding,6=support isa/lrc erasure code}
2019-07-22 19:17:54.491371 7f2064488f40 10 mon.node-1@-1(probing) e3 calc_quorum_requirements required_features 18416819765248
2019-07-22 19:17:54.491374 7f2064488f40 10 mon.node-1@-1(probing) e3 required_features 18416819765248
2019-07-22 19:17:54.491381 7f2064488f40 10 mon.node-1@-1(probing) e3 has_ever_joined = 1
2019-07-22 19:17:54.491411 7f2064488f40 10 mon.node-1@-1(probing) e3 sync_last_committed_floor 0
2019-07-22 19:17:54.491413 7f2064488f40 10 mon.node-1@-1(probing) e3 init_paxos
2019-07-22 19:17:54.491516 7f2064488f40  1 mon.node-1@-1(probing).mds e0 Unable to load 'last_metadata'
2019-07-22 19:17:54.491558 7f2064488f40 10 mon.node-1@-1(probing).health init
2019-07-22 19:17:54.491574 7f2064488f40 10 mon.node-1@-1(probing) e3 refresh_from_paxos
2019-07-22 19:17:54.491608 7f2064488f40  1 mon.node-1@-1(probing).paxosservice(pgmap 21727587..21728259) refresh upgraded, format 0 -> 1
2019-07-22 19:17:54.491612 7f2064488f40  1 mon.node-1@-1(probing).pg v0 on_upgrade discarding in-core PGMap
2019-07-22 19:17:54.491635 7f2064488f40 10 mon.node-1@-1(probing).pg v0 update_from_paxos v0, read_full
2019-07-22 19:17:54.491638 7f2064488f40 10 mon.node-1@-1(probing).pg v0 read_pgmap_meta
 ceph version 12.2.12-218-g9fd889f (9fd889fe09c652512ca78854702d5ad9bf3059bb) luminous (stable)
 1: (()+0x96b249) [0x7f2063e73249]
 2: (()+0x10330) [0x7f20628a0330]
 3: (gsignal()+0x37) [0x7f2060e8bc37]
 4: (abort()+0x148) [0x7f2060e8f028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f206179a535]
 6: (()+0x5e6d6) [0x7f20617986d6]
 7: (()+0x5e703) [0x7f2061798703]
 8: (()+0x5e922) [0x7f2061798922]
 9: (object_stat_sum_t::decode(ceph::buffer::list::iterator&)+0x650) [0x7f2063c81be0]
 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x4f) [0x7f2063c9627f]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x1d5) [0x7f2063c96965]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xf4) [0x7f20639d93b4]
 13: (PGMonitor::read_pgmap_full()+0x161) [0x7f20639a8a81]
 14: (PGMonitor::update_from_paxos(bool*)+0x699) [0x7f20639b0479]
 15: (PaxosService::refresh(bool*)+0x1a3) [0x7f2063a55103]
 16: (Monitor::refresh_from_paxos(bool*)+0x183) [0x7f206390cd53]
 17: (Monitor::init_paxos()+0xfd) [0x7f206390d12d]
 18: (Monitor::preinit()+0xa7e) [0x7f206390dbee]
 19: (main()+0x3bf4) [0x7f206383cde4]
 20: (__libc_start_main()+0xf5) [0x7f2060e76f45]
 21: (()+0x3db4fe) [0x7f20638e34fe]
2019-07-22 19:17:54.495504 7f2064488f40 -1 *** Caught signal (Aborted) **
 in thread 7f2064488f40 thread_name:ceph-mon

 ceph version 12.2.12-218-g9fd889f (9fd889fe09c652512ca78854702d5ad9bf3059bb) luminous (stable)
 1: (()+0x96b249) [0x7f2063e73249]
 2: (()+0x10330) [0x7f20628a0330]
 3: (gsignal()+0x37) [0x7f2060e8bc37]
 4: (abort()+0x148) [0x7f2060e8f028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f206179a535]
 6: (()+0x5e6d6) [0x7f20617986d6]
 7: (()+0x5e703) [0x7f2061798703]
 8: (()+0x5e922) [0x7f2061798922]
 9: (object_stat_sum_t::decode(ceph::buffer::list::iterator&)+0x650) [0x7f2063c81be0]
 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x4f) [0x7f2063c9627f]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x1d5) [0x7f2063c96965]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xf4) [0x7f20639d93b4]
 13: (PGMonitor::read_pgmap_full()+0x161) [0x7f20639a8a81]
 14: (PGMonitor::update_from_paxos(bool*)+0x699) [0x7f20639b0479]
 15: (PaxosService::refresh(bool*)+0x1a3) [0x7f2063a55103]
 16: (Monitor::refresh_from_paxos(bool*)+0x183) [0x7f206390cd53]
 17: (Monitor::init_paxos()+0xfd) [0x7f206390d12d]
 18: (Monitor::preinit()+0xa7e) [0x7f206390dbee]
 19: (main()+0x3bf4) [0x7f206383cde4]
 20: (__libc_start_main()+0xf5) [0x7f2060e76f45]
 21: (()+0x3db4fe) [0x7f20638e34fe]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -61> 2019-07-22 19:17:54.421648 7f2064488f40  5 asok(0x7f206dcc6380) register_command perfcounters_dump hook 0x7f206dc52190
   -60> 2019-07-22 19:17:54.421691 7f2064488f40  5 asok(0x7f206dcc6380) register_command 1 hook 0x7f206dc52190
   -59> 2019-07-22 19:17:54.421697 7f2064488f40  5 asok(0x7f206dcc6380) register_command perf dump hook 0x7f206dc52190
   -58> 2019-07-22 19:17:54.421707 7f2064488f40  5 asok(0x7f206dcc6380) register_command perfcounters_schema hook 0x7f206dc52190
   -57> 2019-07-22 19:17:54.421710 7f2064488f40  5 asok(0x7f206dcc6380) register_command perf histogram dump hook 0x7f206dc52190
   -56> 2019-07-22 19:17:54.421715 7f2064488f40  5 asok(0x7f206dcc6380) register_command 2 hook 0x7f206dc52190
   -55> 2019-07-22 19:17:54.421723 7f2064488f40  5 asok(0x7f206dcc6380) register_command perf schema hook 0x7f206dc52190
   -54> 2019-07-22 19:17:54.421731 7f2064488f40  5 asok(0x7f206dcc6380) register_command perf histogram schema hook 0x7f206dc52190
   -53> 2019-07-22 19:17:54.421746 7f2064488f40  5 asok(0x7f206dcc6380) register_command perf reset hook 0x7f206dc52190
   -52> 2019-07-22 19:17:54.421750 7f2064488f40  5 asok(0x7f206dcc6380) register_command config show hook 0x7f206dc52190
   -51> 2019-07-22 19:17:54.421761 7f2064488f40  5 asok(0x7f206dcc6380) register_command config help hook 0x7f206dc52190
   -50> 2019-07-22 19:17:54.421773 7f2064488f40  5 asok(0x7f206dcc6380) register_command config set hook 0x7f206dc52190
   -49> 2019-07-22 19:17:54.421781 7f2064488f40  5 asok(0x7f206dcc6380) register_command config get hook 0x7f206dc52190
   -48> 2019-07-22 19:17:54.421785 7f2064488f40  5 asok(0x7f206dcc6380) register_command config diff hook 0x7f206dc52190
   -47> 2019-07-22 19:17:54.421791 7f2064488f40  5 asok(0x7f206dcc6380) register_command config diff get hook 0x7f206dc52190
   -46> 2019-07-22 19:17:54.421794 7f2064488f40  5 asok(0x7f206dcc6380) register_command log flush hook 0x7f206dc52190
   -45> 2019-07-22 19:17:54.421798 7f2064488f40  5 asok(0x7f206dcc6380) register_command log dump hook 0x7f206dc52190
   -44> 2019-07-22 19:17:54.421803 7f2064488f40  5 asok(0x7f206dcc6380) register_command log reopen hook 0x7f206dc52190
   -43> 2019-07-22 19:17:54.421821 7f2064488f40  5 asok(0x7f206dcc6380) register_command dump_mempools hook 0x7f206dc751e8
   -42> 2019-07-22 19:17:54.429120 7f2064488f40  0 ceph version 12.2.12-218-g9fd889f (9fd889fe09c652512ca78854702d5ad9bf3059bb) luminous (stable), process ceph-mon, pid 908122
   -41> 2019-07-22 19:17:54.429229 7f2064488f40  0 pidfile_write: ignore empty --pid-file
   -40> 2019-07-22 19:17:54.431720 7f2064488f40  5 asok(0x7f206dcc6380) init /var/run/ceph/ceph-mon.node-1.asok
   -39> 2019-07-22 19:17:54.431751 7f2064488f40  5 asok(0x7f206dcc6380) bind_and_listen /var/run/ceph/ceph-mon.node-1.asok
   -38> 2019-07-22 19:17:54.431866 7f2064488f40  5 asok(0x7f206dcc6380) register_command 0 hook 0x7f206dc4e0c0
   -37> 2019-07-22 19:17:54.431876 7f2064488f40  5 asok(0x7f206dcc6380) register_command version hook 0x7f206dc4e0c0
   -36> 2019-07-22 19:17:54.431889 7f2064488f40  5 asok(0x7f206dcc6380) register_command git_version hook 0x7f206dc4e0c0
   -35> 2019-07-22 19:17:54.431895 7f2064488f40  5 asok(0x7f206dcc6380) register_command help hook 0x7f206dc521d0
   -34> 2019-07-22 19:17:54.431904 7f2064488f40  5 asok(0x7f206dcc6380) register_command get_command_descriptions hook 0x7f206dc522d0
   -33> 2019-07-22 19:17:54.431968 7f205ec61700  5 asok(0x7f206dcc6380) entry start
   -32> 2019-07-22 19:17:54.438472 7f2064488f40  0 load: jerasure load: lrc load: isa 
   -31> 2019-07-22 19:17:54.438908 7f2064488f40  1 leveldb: Recovering log #4402802
   -30> 2019-07-22 19:17:54.489204 7f2064488f40  1 leveldb: Delete type=0 #4402802

   -29> 2019-07-22 19:17:54.489263 7f2064488f40  1 leveldb: Delete type=3 #4402801

   -28> 2019-07-22 19:17:54.489547 7f2064488f40 10 obtain_monmap
   -27> 2019-07-22 19:17:54.489654 7f2064488f40 10 obtain_monmap read last committed monmap ver 3
   -26> 2019-07-22 19:17:54.490318 7f205cf95700  2 Event(0x7f206dcc4080 nevent=5000 time_id=1).set_owner idx=0 owner=139776975525632
   -25> 2019-07-22 19:17:54.490397 7f205c794700  2 Event(0x7f206dcc5680 nevent=5000 time_id=1).set_owner idx=1 owner=139776967132928
   -24> 2019-07-22 19:17:54.490427 7f205bf93700  2 Event(0x7f206dcc5280 nevent=5000 time_id=1).set_owner idx=2 owner=139776958740224
   -23> 2019-07-22 19:17:54.490558 7f2064488f40  0 starting mon.node-1 rank 2 at public addr 192.168.1.16:6789/0 at bind addr 192.168.1.16:6789/0 mon_data /var/lib/ceph/mon/ceph-node-1 fsid cf635990-70fa-43ed-978d-96f92f9ccc92
   -22> 2019-07-22 19:17:54.490690 7f2064488f40  1 -- 192.168.1.16:6789/0 learned_addr learned my addr 192.168.1.16:6789/0
   -21> 2019-07-22 19:17:54.490696 7f2064488f40  1 -- 192.168.1.16:6789/0 _finish_bind bind my_inst.addr is 192.168.1.16:6789/0
   -20> 2019-07-22 19:17:54.490737 7f2064488f40  0 starting mon.node-1 rank 2 at 192.168.1.16:6789/0 mon_data /var/lib/ceph/mon/ceph-node-1 fsid cf635990-70fa-43ed-978d-96f92f9ccc92
   -19> 2019-07-22 19:17:54.490772 7f2064488f40  5 adding auth protocol: cephx
   -18> 2019-07-22 19:17:54.490773 7f2064488f40  5 adding auth protocol: cephx
   -17> 2019-07-22 19:17:54.490817 7f2064488f40 10 log_channel(cluster) update_config to_monitors: true to_syslog: false syslog_facility: daemon prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201)
   -16> 2019-07-22 19:17:54.490820 7f2064488f40 10 log_channel(audit) update_config to_monitors: true to_syslog: false syslog_facility: local0 prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201)
   -15> 2019-07-22 19:17:54.491279 7f2064488f40  1 mon.node-1@-1(probing) e3 preinit fsid cf635990-70fa-43ed-978d-96f92f9ccc92
   -14> 2019-07-22 19:17:54.491351 7f2064488f40 10 mon.node-1@-1(probing) e3 check_fsid cluster_uuid contains 'cf635990-70fa-43ed-978d-96f92f9ccc92'
   -13> 2019-07-22 19:17:54.491363 7f2064488f40 10 mon.node-1@-1(probing) e3 features compat={},rocompat={},incompat={1=initial feature set (~v.18),3=single paxos with k/v store (v0.?),4=support erasure code pools,5=new-style osdmap encoding,6=support isa/lrc erasure code}
   -12> 2019-07-22 19:17:54.491371 7f2064488f40 10 mon.node-1@-1(probing) e3 calc_quorum_requirements required_features 18416819765248
   -11> 2019-07-22 19:17:54.491374 7f2064488f40 10 mon.node-1@-1(probing) e3 required_features 18416819765248
   -10> 2019-07-22 19:17:54.491381 7f2064488f40 10 mon.node-1@-1(probing) e3 has_ever_joined = 1
    -9> 2019-07-22 19:17:54.491411 7f2064488f40 10 mon.node-1@-1(probing) e3 sync_last_committed_floor 0
    -8> 2019-07-22 19:17:54.491413 7f2064488f40 10 mon.node-1@-1(probing) e3 init_paxos
    -7> 2019-07-22 19:17:54.491516 7f2064488f40  1 mon.node-1@-1(probing).mds e0 Unable to load 'last_metadata'
    -6> 2019-07-22 19:17:54.491558 7f2064488f40 10 mon.node-1@-1(probing).health init
    -5> 2019-07-22 19:17:54.491574 7f2064488f40 10 mon.node-1@-1(probing) e3 refresh_from_paxos
    -4> 2019-07-22 19:17:54.491608 7f2064488f40  1 mon.node-1@-1(probing).paxosservice(pgmap 21727587..21728259) refresh upgraded, format 0 -> 1
    -3> 2019-07-22 19:17:54.491612 7f2064488f40  1 mon.node-1@-1(probing).pg v0 on_upgrade discarding in-core PGMap
    -2> 2019-07-22 19:17:54.491635 7f2064488f40 10 mon.node-1@-1(probing).pg v0 update_from_paxos v0, read_full
    -1> 2019-07-22 19:17:54.491638 7f2064488f40 10 mon.node-1@-1(probing).pg v0 read_pgmap_meta
     0> 2019-07-22 19:17:54.495504 7f2064488f40 -1 *** Caught signal (Aborted) **
 in thread 7f2064488f40 thread_name:ceph-mon

 ceph version 12.2.12-218-g9fd889f (9fd889fe09c652512ca78854702d5ad9bf3059bb) luminous (stable)
 1: (()+0x96b249) [0x7f2063e73249]
 2: (()+0x10330) [0x7f20628a0330]
 3: (gsignal()+0x37) [0x7f2060e8bc37]
 4: (abort()+0x148) [0x7f2060e8f028]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f206179a535]
 6: (()+0x5e6d6) [0x7f20617986d6]
 7: (()+0x5e703) [0x7f2061798703]
 8: (()+0x5e922) [0x7f2061798922]
 9: (object_stat_sum_t::decode(ceph::buffer::list::iterator&)+0x650) [0x7f2063c81be0]
 10: (object_stat_collection_t::decode(ceph::buffer::list::iterator&)+0x4f) [0x7f2063c9627f]
 11: (pg_stat_t::decode(ceph::buffer::list::iterator&)+0x1d5) [0x7f2063c96965]
 12: (PGMap::update_pg(pg_t, ceph::buffer::list&)+0xf4) [0x7f20639d93b4]
 13: (PGMonitor::read_pgmap_full()+0x161) [0x7f20639a8a81]
 14: (PGMonitor::update_from_paxos(bool*)+0x699) [0x7f20639b0479]
 15: (PaxosService::refresh(bool*)+0x1a3) [0x7f2063a55103]
 16: (Monitor::refresh_from_paxos(bool*)+0x183) [0x7f206390cd53]
 17: (Monitor::init_paxos()+0xfd) [0x7f206390d12d]
 18: (Monitor::preinit()+0xa7e) [0x7f206390dbee]
 19: (main()+0x3bf4) [0x7f206383cde4]
 20: (__libc_start_main()+0xf5) [0x7f2060e76f45]
 21: (()+0x3db4fe) [0x7f20638e34fe]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
  100/100 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  99/99 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file 
--- end dump of recent events ---

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux