I am in the process of upgrading a cluster with mixed 0.94.2/0.94.3 to 0.94.5 this morning and am seeing identical crashes. In the process of doing a rolling upgrade across the mons this morning, after the 3rd of 3 mons was restarted to 0.94.5, all 3 crashed simultaneously identical to what you are describing above. Now I am seeing rolling crashes across the 3 mons continually. I am still in the process of upgrading about 200 OSDs to 0.94.5 so most of them are still running 0.94.2 and 0.94.3. There are 3 mds's running 0.94.5 during these crashes. ==> /var/log/clusterboot/lsn-mc1008/syslog <== Nov 10 10:07:30 lsn-mc1008 kernel: [6392349.844640] init: ceph-mon (ceph/lsn-mc1008) main process (2254664) killed by SEGV signal Nov 10 10:07:30 lsn-mc1008 kernel: [6392349.844648] init: ceph-mon (ceph/lsn-mc1008) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1006/syslog <== Nov 10 10:07:46 lsn-mc1006 kernel: [6392890.294124] init: ceph-mon (ceph/lsn-mc1006) main process (2183307) killed by SEGV signal Nov 10 10:07:46 lsn-mc1006 kernel: [6392890.294132] init: ceph-mon (ceph/lsn-mc1006) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1007/syslog <== Nov 10 10:07:46 lsn-mc1007 kernel: [6392599.894914] init: ceph-mon (ceph/lsn-mc1007) main process (1998234) killed by SEGV signal Nov 10 10:07:46 lsn-mc1007 kernel: [6392599.894923] init: ceph-mon (ceph/lsn-mc1007) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1008/syslog <== Nov 10 10:07:46 lsn-mc1008 kernel: [6392365.959984] init: ceph-mon (ceph/lsn-mc1008) main process (2263082) killed by SEGV signal Nov 10 10:07:46 lsn-mc1008 kernel: [6392365.959992] init: ceph-mon (ceph/lsn-mc1008) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1006/syslog <== Nov 10 10:07:52 lsn-mc1006 kernel: [6392896.674332] init: ceph-mon (ceph/lsn-mc1006) main process (2191273) killed by SEGV signal Nov 10 10:07:52 lsn-mc1006 kernel: [6392896.674340] init: ceph-mon (ceph/lsn-mc1006) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1008/syslog <== Nov 10 10:07:52 lsn-mc1008 kernel: [6392372.324282] init: ceph-mon (ceph/lsn-mc1008) main process (2270979) killed by SEGV signal Nov 10 10:07:52 lsn-mc1008 kernel: [6392372.324295] init: ceph-mon (ceph/lsn-mc1008) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1007/syslog <== Nov 10 10:07:52 lsn-mc1007 kernel: [6392606.272911] init: ceph-mon (ceph/lsn-mc1007) main process (2006118) killed by SEGV signal Nov 10 10:07:52 lsn-mc1007 kernel: [6392606.272995] init: ceph-mon (ceph/lsn-mc1007) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1006/syslog <== Nov 10 10:07:55 lsn-mc1006 kernel: [6392899.046307] init: ceph-mon (ceph/lsn-mc1006) main process (2192187) killed by SEGV signal Nov 10 10:07:55 lsn-mc1006 kernel: [6392899.046315] init: ceph-mon (ceph/lsn-mc1006) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1007/syslog <== Nov 10 10:08:17 lsn-mc1007 kernel: [6392631.192476] init: ceph-mon (ceph/lsn-mc1007) main process (2006489) killed by SEGV signal Nov 10 10:08:17 lsn-mc1007 kernel: [6392631.192484] init: ceph-mon (ceph/lsn-mc1007) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1006/syslog <== Nov 10 10:08:17 lsn-mc1006 kernel: [6392921.600089] init: ceph-mon (ceph/lsn-mc1006) main process (2192298) killed by SEGV signal Nov 10 10:08:17 lsn-mc1006 kernel: [6392921.600108] init: ceph-mon (ceph/lsn-mc1006) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1008/syslog <== Nov 10 10:08:17 lsn-mc1008 kernel: [6392397.277994] init: ceph-mon (ceph/lsn-mc1008) main process (2271246) killed by SEGV signal Nov 10 10:08:17 lsn-mc1008 kernel: [6392397.278002] init: ceph-mon (ceph/lsn-mc1008) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1006/syslog <== Nov 10 10:08:23 lsn-mc1006 kernel: [6392927.999229] init: ceph-mon (ceph/lsn-mc1006) main process (2200399) killed by SEGV signal Nov 10 10:08:23 lsn-mc1006 kernel: [6392927.999242] init: ceph-mon (ceph/lsn-mc1006) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1008/syslog <== Nov 10 10:08:23 lsn-mc1008 kernel: [6392403.641241] init: ceph-mon (ceph/lsn-mc1008) main process (2279050) killed by SEGV signal Nov 10 10:08:23 lsn-mc1008 kernel: [6392403.641254] init: ceph-mon (ceph/lsn-mc1008) main process ended, respawning ==> /var/log/clusterboot/lsn-mc1007/syslog <== Nov 10 10:08:24 lsn-mc1007 kernel: [6392637.614495] init: ceph-mon (ceph/lsn-mc1007) main process (2013418) killed by SEGV signal Nov 10 10:08:24 lsn-mc1007 kernel: [6392637.614504] init: ceph-mon (ceph/lsn-mc1007) main process ended, respawning On Mon, Nov 2, 2015 at 8:35 AM, Arnulf Heimsbakk <arnulf.heimsbakk@xxxxxx> wrote: > When I did a unset noout on the cluster all three mons got a > segmentation fault, then continued as if nothing had happened. Regular > segmentation faults started on mons after upgrading to 0.94.5. Ubuntu > Trusty LTS. Anyone had similar? > > -Arnulf > > Backtraces: > > mon1: > > #0 0x00007f0b2969120b in raise (sig=11) > at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37 > #1 0x00000000009adfbd in reraise_fatal (signum=11) > at global/signal_handler.cc:59 > #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:109 > #3 <signal handler called> > #4 0x00000000006518e5 in std::_Rb_tree<std::string, > std::pair<std::string const, std::string>, > std::_Select1st<std::pair<std::string const, std::string> >, > std::less<std::string>, std::allocator<std::pair<std::string const, > std::string> > >::find (this=this@entry=0x47dac90, __k=...) > at /usr/include/c++/4.8/bits/stl_tree.h:1805 > #5 0x00000000008a002e in find (__x=..., this=<optimized out>) > at /usr/include/c++/4.8/bits/stl_map.h:837 > #6 get_str_map_key (str_map=..., key=..., > fallback_key=fallback_key@entry=0xd1d210 > <_ZL23CLOG_CONFIG_DEFAULT_KEY>) > at common/str_map.cc:120 > #7 0x00000000006b0a5a in get_facility (channel=..., this=0x47dac30) > at mon/LogMonitor.h:79 > #8 LogMonitor::update_from_paxos (this=0x47dab40, > need_bootstrap=<optimized out>) at mon/LogMonitor.cc:141 > #9 0x000000000060432a in PaxosService::refresh (this=0x47dab40, > need_bootstrap=need_bootstrap@entry=0x7f0b208b9f3f) > at mon/PaxosService.cc:128 > #10 0x00000000005b03db in Monitor::refresh_from_paxos (this=0x4968000, > need_bootstrap=need_bootstrap@entry=0x7f0b208b9f3f) at > mon/Monitor.cc:788 > #11 0x00000000005eea5e in Paxos::do_refresh (this=this@entry=0x4874dc0) > at mon/Paxos.cc:1008 > #12 0x00000000005f5c83 in Paxos::handle_commit > (this=this@entry=0x4874dc0, > commit=commit@entry=0x73a7480) at mon/Paxos.cc:933 > #13 0x00000000005fd7bb in Paxos::dispatch (this=0x4874dc0, > m=m@entry=0x73a7480) at mon/Paxos.cc:1399 > #14 0x00000000005cf9e3 in Monitor::dispatch (this=this@entry=0x4968000, > s=s@entry=0x47d7f80, m=m@entry=0x73a7480, > src_is_mon=src_is_mon@entry=true) at mon/Monitor.cc:3567 > #15 0x00000000005cfe36 in Monitor::_ms_dispatch > (this=this@entry=0x4968000, > m=m@entry=0x73a7480) at mon/Monitor.cc:3376 > #16 0x00000000005edb43 in Monitor::ms_dispatch (this=0x4968000, > m=0x73a7480) > at mon/Monitor.h:833 > #17 0x0000000000929679 in ms_deliver_dispatch (m=0x73a7480, > this=0x49be700) > at ./msg/Messenger.h:567 > #18 DispatchQueue::entry (this=0x49be8c8) at > msg/simple/DispatchQueue.cc:185 > #19 0x00000000007c99cd in DispatchQueue::DispatchThread::entry ( > this=<optimized out>) at msg/simple/DispatchQueue.h:103 > #20 0x00007f0b29689182 in start_thread (arg=0x7f0b208bb700) > at pthread_create.c:312 > #21 0x00007f0b27bf447d in clone () > at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > mon2: > > #0 0x00007fd27c06520b in raise () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00000000009adfbd in reraise_fatal (signum=11) > at global/signal_handler.cc:59 > #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:109 > #3 <signal handler called> > #4 0x00000000006518e5 in std::_Rb_tree<std::string, > std::pair<std::string const, std::string>, > std::_Select1st<std::pair<std::string const, std::string> >, > std::less<std::string>, std::allocator<std::pair<std::string const, > std::string> > >::find (this=this@entry=0x36a6390, __k=...) > at /usr/include/c++/4.8/bits/stl_tree.h:1805 > #5 0x00000000008a002e in find (__x=..., this=<optimized out>) > at /usr/include/c++/4.8/bits/stl_map.h:837 > #6 get_str_map_key (str_map=..., key=..., > fallback_key=fallback_key@entry=0xd1d210 > <_ZL23CLOG_CONFIG_DEFAULT_KEY>) > at common/str_map.cc:120 > #7 0x00000000006b0a5a in get_facility (channel=..., this=0x36a6330) > at mon/LogMonitor.h:79 > #8 LogMonitor::update_from_paxos (this=0x36a6240, > need_bootstrap=<optimized out>) at mon/LogMonitor.cc:141 > #9 0x000000000060432a in PaxosService::refresh (this=0x36a6240, > need_bootstrap=need_bootstrap@entry=0x7fd276f5d6af) > at mon/PaxosService.cc:128 > #10 0x00000000005b03db in Monitor::refresh_from_paxos (this=0x37feb00, > need_bootstrap=need_bootstrap@entry=0x7fd276f5d6af) at > mon/Monitor.cc:788 > #11 0x00000000005eea5e in Paxos::do_refresh (this=this@entry=0x3740dc0) > at mon/Paxos.cc:1008 > #12 0x00000000005fbf39 in Paxos::commit_finish (this=0x3740dc0) > at mon/Paxos.cc:903 > #13 0x000000000060038b in C_Committed::finish (this=0x4600ad0, > r=<optimized out>) at mon/Paxos.cc:807 > #14 0x00000000005d4d89 in Context::complete (this=0x4600ad0, > r=<optimized out>) at ./include/Context.h:65 > #15 0x00000000005ff4bc in MonitorDBStore::C_DoTransaction::finish ( > this=0x38258c0, r=<optimized out>) at mon/MonitorDBStore.h:326 > #16 0x00000000005d4d89 in Context::complete (this=0x38258c0, > r=<optimized out>) at ./include/Context.h:65 > #17 0x0000000000717e88 in Finisher::finisher_thread_entry (this=0x3683350) > at common/Finisher.cc:59 > #18 0x00007fd27c05d182 in start_thread () > from /lib/x86_64-linux-gnu/libpthread.so.0 > #19 0x00007fd27a5c847d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > mon3: > > #0 0x00007f4f0cfce20b in raise () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00000000009adfbd in reraise_fatal (signum=11) > at global/signal_handler.cc:59 > #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:109 > #3 <signal handler called> > #4 0x00000000006518e5 in std::_Rb_tree<std::string, > std::pair<std::string const, std::string>, > std::_Select1st<std::pair<std::string const, std::string> >, > std::less<std::string>, std::allocator<std::pair<std::string const, > std::string> > >::find (this=this@entry=0x35a4c90, __k=...) > at /usr/include/c++/4.8/bits/stl_tree.h:1805 > #5 0x00000000008a002e in find (__x=..., this=<optimized out>) > at /usr/include/c++/4.8/bits/stl_map.h:837 > #6 get_str_map_key (str_map=..., key=..., > fallback_key=fallback_key@entry=0xd1d210 > <_ZL23CLOG_CONFIG_DEFAULT_KEY>) > at common/str_map.cc:120 > #7 0x00000000006b0a5a in get_facility (channel=..., this=0x35a4c30) > at mon/LogMonitor.h:79 > #8 LogMonitor::update_from_paxos (this=0x35a4b40, > need_bootstrap=<optimized out>) at mon/LogMonitor.cc:141 > #9 0x000000000060432a in PaxosService::refresh (this=0x35a4b40, > need_bootstrap=need_bootstrap@entry=0x7f4f038aef3f) > at mon/PaxosService.cc:128 > #10 0x00000000005b03db in Monitor::refresh_from_paxos (this=0x3d34b00, > need_bootstrap=need_bootstrap@entry=0x7f4f038aef3f) at > mon/Monitor.cc:788 > #11 0x00000000005eea5e in Paxos::do_refresh (this=this@entry=0x363f080) > at mon/Paxos.cc:1008 > #12 0x00000000005f5c83 in Paxos::handle_commit > (this=this@entry=0x363f080, > commit=commit@entry=0x6c1d900) at mon/Paxos.cc:933 > #13 0x00000000005fd7bb in Paxos::dispatch (this=0x363f080, > m=m@entry=0x6c1d900) at mon/Paxos.cc:1399 > #14 0x00000000005cf9e3 in Monitor::dispatch (this=this@entry=0x3d34b00, > s=s@entry=0x35a2f40, m=m@entry=0x6c1d900, > src_is_mon=src_is_mon@entry=true) at mon/Monitor.cc:3567 > #15 0x00000000005cfe36 in Monitor::_ms_dispatch > (this=this@entry=0x3d34b00, > m=m@entry=0x6c1d900) at mon/Monitor.cc:3376 > #16 0x00000000005edb43 in Monitor::ms_dispatch (this=0x3d34b00, > m=0x6c1d900) > at mon/Monitor.h:833 > #17 0x0000000000929679 in ms_deliver_dispatch (m=0x6c1d900, > this=0x3ccae00) > at ./msg/Messenger.h:567 > #18 DispatchQueue::entry (this=0x3ccafc8) at > msg/simple/DispatchQueue.cc:185 > #19 0x00000000007c99cd in DispatchQueue::DispatchThread::entry ( > this=<optimized out>) at msg/simple/DispatchQueue.h:103 > #20 0x00007f4f0cfc6182 in start_thread () > from /lib/x86_64-linux-gnu/libpthread.so.0 > #21 0x00007f4f0b53147d in clone () from /lib/x86_64-linux-gnu/libc.so.6 > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com