Re: all three mons segfault at same time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am in the process of upgrading a cluster with mixed 0.94.2/0.94.3 to
0.94.5 this morning and am seeing identical crashes. In the process of
doing a rolling upgrade across the mons this morning, after the 3rd of
3 mons was restarted to 0.94.5, all 3 crashed simultaneously identical
to what you are describing above. Now I am seeing rolling crashes
across the 3 mons continually. I am still in the process of upgrading
about 200 OSDs to 0.94.5 so most of them are still running 0.94.2 and
0.94.3. There are 3 mds's running 0.94.5 during these crashes.

==> /var/log/clusterboot/lsn-mc1008/syslog <==
Nov 10 10:07:30 lsn-mc1008 kernel: [6392349.844640] init: ceph-mon
(ceph/lsn-mc1008) main process (2254664) killed by SEGV signal
Nov 10 10:07:30 lsn-mc1008 kernel: [6392349.844648] init: ceph-mon
(ceph/lsn-mc1008) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1006/syslog <==
Nov 10 10:07:46 lsn-mc1006 kernel: [6392890.294124] init: ceph-mon
(ceph/lsn-mc1006) main process (2183307) killed by SEGV signal
Nov 10 10:07:46 lsn-mc1006 kernel: [6392890.294132] init: ceph-mon
(ceph/lsn-mc1006) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1007/syslog <==
Nov 10 10:07:46 lsn-mc1007 kernel: [6392599.894914] init: ceph-mon
(ceph/lsn-mc1007) main process (1998234) killed by SEGV signal
Nov 10 10:07:46 lsn-mc1007 kernel: [6392599.894923] init: ceph-mon
(ceph/lsn-mc1007) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1008/syslog <==
Nov 10 10:07:46 lsn-mc1008 kernel: [6392365.959984] init: ceph-mon
(ceph/lsn-mc1008) main process (2263082) killed by SEGV signal
Nov 10 10:07:46 lsn-mc1008 kernel: [6392365.959992] init: ceph-mon
(ceph/lsn-mc1008) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1006/syslog <==
Nov 10 10:07:52 lsn-mc1006 kernel: [6392896.674332] init: ceph-mon
(ceph/lsn-mc1006) main process (2191273) killed by SEGV signal
Nov 10 10:07:52 lsn-mc1006 kernel: [6392896.674340] init: ceph-mon
(ceph/lsn-mc1006) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1008/syslog <==
Nov 10 10:07:52 lsn-mc1008 kernel: [6392372.324282] init: ceph-mon
(ceph/lsn-mc1008) main process (2270979) killed by SEGV signal
Nov 10 10:07:52 lsn-mc1008 kernel: [6392372.324295] init: ceph-mon
(ceph/lsn-mc1008) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1007/syslog <==
Nov 10 10:07:52 lsn-mc1007 kernel: [6392606.272911] init: ceph-mon
(ceph/lsn-mc1007) main process (2006118) killed by SEGV signal
Nov 10 10:07:52 lsn-mc1007 kernel: [6392606.272995] init: ceph-mon
(ceph/lsn-mc1007) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1006/syslog <==
Nov 10 10:07:55 lsn-mc1006 kernel: [6392899.046307] init: ceph-mon
(ceph/lsn-mc1006) main process (2192187) killed by SEGV signal
Nov 10 10:07:55 lsn-mc1006 kernel: [6392899.046315] init: ceph-mon
(ceph/lsn-mc1006) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1007/syslog <==
Nov 10 10:08:17 lsn-mc1007 kernel: [6392631.192476] init: ceph-mon
(ceph/lsn-mc1007) main process (2006489) killed by SEGV signal
Nov 10 10:08:17 lsn-mc1007 kernel: [6392631.192484] init: ceph-mon
(ceph/lsn-mc1007) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1006/syslog <==
Nov 10 10:08:17 lsn-mc1006 kernel: [6392921.600089] init: ceph-mon
(ceph/lsn-mc1006) main process (2192298) killed by SEGV signal
Nov 10 10:08:17 lsn-mc1006 kernel: [6392921.600108] init: ceph-mon
(ceph/lsn-mc1006) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1008/syslog <==
Nov 10 10:08:17 lsn-mc1008 kernel: [6392397.277994] init: ceph-mon
(ceph/lsn-mc1008) main process (2271246) killed by SEGV signal
Nov 10 10:08:17 lsn-mc1008 kernel: [6392397.278002] init: ceph-mon
(ceph/lsn-mc1008) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1006/syslog <==
Nov 10 10:08:23 lsn-mc1006 kernel: [6392927.999229] init: ceph-mon
(ceph/lsn-mc1006) main process (2200399) killed by SEGV signal
Nov 10 10:08:23 lsn-mc1006 kernel: [6392927.999242] init: ceph-mon
(ceph/lsn-mc1006) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1008/syslog <==
Nov 10 10:08:23 lsn-mc1008 kernel: [6392403.641241] init: ceph-mon
(ceph/lsn-mc1008) main process (2279050) killed by SEGV signal
Nov 10 10:08:23 lsn-mc1008 kernel: [6392403.641254] init: ceph-mon
(ceph/lsn-mc1008) main process ended, respawning
==> /var/log/clusterboot/lsn-mc1007/syslog <==
Nov 10 10:08:24 lsn-mc1007 kernel: [6392637.614495] init: ceph-mon
(ceph/lsn-mc1007) main process (2013418) killed by SEGV signal
Nov 10 10:08:24 lsn-mc1007 kernel: [6392637.614504] init: ceph-mon
(ceph/lsn-mc1007) main process ended, respawning


On Mon, Nov 2, 2015 at 8:35 AM, Arnulf Heimsbakk
<arnulf.heimsbakk@xxxxxx> wrote:
> When I did a unset noout on the cluster all three mons got a
> segmentation fault, then continued as if nothing had happened. Regular
> segmentation faults started on mons after upgrading to 0.94.5. Ubuntu
> Trusty LTS. Anyone had similar?
>
> -Arnulf
>
> Backtraces:
>
> mon1:
>
> #0  0x00007f0b2969120b in raise (sig=11)
>     at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
> #1  0x00000000009adfbd in reraise_fatal (signum=11)
>     at global/signal_handler.cc:59
> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:109
> #3  <signal handler called>
> #4  0x00000000006518e5 in std::_Rb_tree<std::string,
> std::pair<std::string const, std::string>,
> std::_Select1st<std::pair<std::string const, std::string> >,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > >::find (this=this@entry=0x47dac90, __k=...)
>     at /usr/include/c++/4.8/bits/stl_tree.h:1805
> #5  0x00000000008a002e in find (__x=..., this=<optimized out>)
>     at /usr/include/c++/4.8/bits/stl_map.h:837
> #6  get_str_map_key (str_map=..., key=...,
>     fallback_key=fallback_key@entry=0xd1d210
> <_ZL23CLOG_CONFIG_DEFAULT_KEY>)
>     at common/str_map.cc:120
> #7  0x00000000006b0a5a in get_facility (channel=..., this=0x47dac30)
>     at mon/LogMonitor.h:79
> #8  LogMonitor::update_from_paxos (this=0x47dab40,
>     need_bootstrap=<optimized out>) at mon/LogMonitor.cc:141
> #9  0x000000000060432a in PaxosService::refresh (this=0x47dab40,
>     need_bootstrap=need_bootstrap@entry=0x7f0b208b9f3f)
>     at mon/PaxosService.cc:128
> #10 0x00000000005b03db in Monitor::refresh_from_paxos (this=0x4968000,
>     need_bootstrap=need_bootstrap@entry=0x7f0b208b9f3f) at
> mon/Monitor.cc:788
> #11 0x00000000005eea5e in Paxos::do_refresh (this=this@entry=0x4874dc0)
>     at mon/Paxos.cc:1008
> #12 0x00000000005f5c83 in Paxos::handle_commit
> (this=this@entry=0x4874dc0,
>     commit=commit@entry=0x73a7480) at mon/Paxos.cc:933
> #13 0x00000000005fd7bb in Paxos::dispatch (this=0x4874dc0,
>     m=m@entry=0x73a7480) at mon/Paxos.cc:1399
> #14 0x00000000005cf9e3 in Monitor::dispatch (this=this@entry=0x4968000,
>     s=s@entry=0x47d7f80, m=m@entry=0x73a7480,
>     src_is_mon=src_is_mon@entry=true) at mon/Monitor.cc:3567
> #15 0x00000000005cfe36 in Monitor::_ms_dispatch
> (this=this@entry=0x4968000,
>     m=m@entry=0x73a7480) at mon/Monitor.cc:3376
> #16 0x00000000005edb43 in Monitor::ms_dispatch (this=0x4968000,
> m=0x73a7480)
>     at mon/Monitor.h:833
> #17 0x0000000000929679 in ms_deliver_dispatch (m=0x73a7480,
> this=0x49be700)
>     at ./msg/Messenger.h:567
> #18 DispatchQueue::entry (this=0x49be8c8) at
> msg/simple/DispatchQueue.cc:185
> #19 0x00000000007c99cd in DispatchQueue::DispatchThread::entry (
>     this=<optimized out>) at msg/simple/DispatchQueue.h:103
> #20 0x00007f0b29689182 in start_thread (arg=0x7f0b208bb700)
>     at pthread_create.c:312
> #21 0x00007f0b27bf447d in clone ()
>     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
>
> mon2:
>
> #0  0x00007fd27c06520b in raise () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000009adfbd in reraise_fatal (signum=11)
>     at global/signal_handler.cc:59
> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:109
> #3  <signal handler called>
> #4  0x00000000006518e5 in std::_Rb_tree<std::string,
> std::pair<std::string const, std::string>,
> std::_Select1st<std::pair<std::string const, std::string> >,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > >::find (this=this@entry=0x36a6390, __k=...)
>     at /usr/include/c++/4.8/bits/stl_tree.h:1805
> #5  0x00000000008a002e in find (__x=..., this=<optimized out>)
>     at /usr/include/c++/4.8/bits/stl_map.h:837
> #6  get_str_map_key (str_map=..., key=...,
>     fallback_key=fallback_key@entry=0xd1d210
> <_ZL23CLOG_CONFIG_DEFAULT_KEY>)
>     at common/str_map.cc:120
> #7  0x00000000006b0a5a in get_facility (channel=..., this=0x36a6330)
>     at mon/LogMonitor.h:79
> #8  LogMonitor::update_from_paxos (this=0x36a6240,
>     need_bootstrap=<optimized out>) at mon/LogMonitor.cc:141
> #9  0x000000000060432a in PaxosService::refresh (this=0x36a6240,
>     need_bootstrap=need_bootstrap@entry=0x7fd276f5d6af)
>     at mon/PaxosService.cc:128
> #10 0x00000000005b03db in Monitor::refresh_from_paxos (this=0x37feb00,
>     need_bootstrap=need_bootstrap@entry=0x7fd276f5d6af) at
> mon/Monitor.cc:788
> #11 0x00000000005eea5e in Paxos::do_refresh (this=this@entry=0x3740dc0)
>     at mon/Paxos.cc:1008
> #12 0x00000000005fbf39 in Paxos::commit_finish (this=0x3740dc0)
>     at mon/Paxos.cc:903
> #13 0x000000000060038b in C_Committed::finish (this=0x4600ad0,
>     r=<optimized out>) at mon/Paxos.cc:807
> #14 0x00000000005d4d89 in Context::complete (this=0x4600ad0,
>     r=<optimized out>) at ./include/Context.h:65
> #15 0x00000000005ff4bc in MonitorDBStore::C_DoTransaction::finish (
>     this=0x38258c0, r=<optimized out>) at mon/MonitorDBStore.h:326
> #16 0x00000000005d4d89 in Context::complete (this=0x38258c0,
>     r=<optimized out>) at ./include/Context.h:65
> #17 0x0000000000717e88 in Finisher::finisher_thread_entry (this=0x3683350)
>     at common/Finisher.cc:59
> #18 0x00007fd27c05d182 in start_thread ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> #19 0x00007fd27a5c847d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>
> mon3:
>
> #0  0x00007f4f0cfce20b in raise () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00000000009adfbd in reraise_fatal (signum=11)
>     at global/signal_handler.cc:59
> #2  handle_fatal_signal (signum=11) at global/signal_handler.cc:109
> #3  <signal handler called>
> #4  0x00000000006518e5 in std::_Rb_tree<std::string,
> std::pair<std::string const, std::string>,
> std::_Select1st<std::pair<std::string const, std::string> >,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > >::find (this=this@entry=0x35a4c90, __k=...)
>     at /usr/include/c++/4.8/bits/stl_tree.h:1805
> #5  0x00000000008a002e in find (__x=..., this=<optimized out>)
>     at /usr/include/c++/4.8/bits/stl_map.h:837
> #6  get_str_map_key (str_map=..., key=...,
>     fallback_key=fallback_key@entry=0xd1d210
> <_ZL23CLOG_CONFIG_DEFAULT_KEY>)
>     at common/str_map.cc:120
> #7  0x00000000006b0a5a in get_facility (channel=..., this=0x35a4c30)
>     at mon/LogMonitor.h:79
> #8  LogMonitor::update_from_paxos (this=0x35a4b40,
>     need_bootstrap=<optimized out>) at mon/LogMonitor.cc:141
> #9  0x000000000060432a in PaxosService::refresh (this=0x35a4b40,
>     need_bootstrap=need_bootstrap@entry=0x7f4f038aef3f)
>     at mon/PaxosService.cc:128
> #10 0x00000000005b03db in Monitor::refresh_from_paxos (this=0x3d34b00,
>     need_bootstrap=need_bootstrap@entry=0x7f4f038aef3f) at
> mon/Monitor.cc:788
> #11 0x00000000005eea5e in Paxos::do_refresh (this=this@entry=0x363f080)
>     at mon/Paxos.cc:1008
> #12 0x00000000005f5c83 in Paxos::handle_commit
> (this=this@entry=0x363f080,
>     commit=commit@entry=0x6c1d900) at mon/Paxos.cc:933
> #13 0x00000000005fd7bb in Paxos::dispatch (this=0x363f080,
>     m=m@entry=0x6c1d900) at mon/Paxos.cc:1399
> #14 0x00000000005cf9e3 in Monitor::dispatch (this=this@entry=0x3d34b00,
>     s=s@entry=0x35a2f40, m=m@entry=0x6c1d900,
>     src_is_mon=src_is_mon@entry=true) at mon/Monitor.cc:3567
> #15 0x00000000005cfe36 in Monitor::_ms_dispatch
> (this=this@entry=0x3d34b00,
>     m=m@entry=0x6c1d900) at mon/Monitor.cc:3376
> #16 0x00000000005edb43 in Monitor::ms_dispatch (this=0x3d34b00,
> m=0x6c1d900)
>     at mon/Monitor.h:833
> #17 0x0000000000929679 in ms_deliver_dispatch (m=0x6c1d900,
> this=0x3ccae00)
>     at ./msg/Messenger.h:567
> #18 DispatchQueue::entry (this=0x3ccafc8) at
> msg/simple/DispatchQueue.cc:185
> #19 0x00000000007c99cd in DispatchQueue::DispatchThread::entry (
>     this=<optimized out>) at msg/simple/DispatchQueue.h:103
> #20 0x00007f4f0cfc6182 in start_thread ()
>    from /lib/x86_64-linux-gnu/libpthread.so.0
> #21 0x00007f4f0b53147d in clone () from /lib/x86_64-linux-gnu/libc.so.6
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux