Re: Hit suicide timeout after adding new osd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The secondary crash is fixed by 17160843d0c523359d8fa934418ff2c1f7bffb25.
Also backported to bobtail.

If you have the core from the second instance of the heartbeat timeout, 
from the wip-pg-removal branch, please post the 'thread apply all bt' 
output.

Thanks!
sage



On Sat, 19 Jan 2013, Jens Kristian S?gaard wrote:

> Hi Sage,
> 
> > Do you have a full log for this?
> 
> I upped the log level and started the osd again.
> 
> It ran for 23 seconds and then suddenly crashed out of the blue.
> 
> The last log lines were:
> 
> 2013-01-19 19:31:39.975475 7f50de7fc700 10 osd.2 pg_epoch: 416 pg[0.fc( v
> 164'38593 (164'37592,164'38593] local-les=247 n=2622 ec=1 les/c 247/156
> 379/379/379) [1,3] r=-1 lpr=379 pi=152-378/14 lcod 0'0 inactive NOTIFY]
> state<Reset>: Reset advmap
> 2013-01-19 19:31:39.975483 7f50de7fc700 10 osd.2 pg_epoch: 416 pg[0.fc( v
> 164'38593 (164'37592,164'38593] local-les=247 n=2622 ec=1 les/c 247/156
> 379/379/379) [1,3] r=-1 lpr=379 pi=152-378/14 lcod 0'0 inactive NOTIFY]
> _calc_past_interval_range: already have past intervals back to 156
> 2013-01-19 19:31:39.975495 7f50de7fc700 10 osd.2 pg_epoch: 416 pg[0.fc( v
> 164'38593 (164'37592,164'38593] local-les=247 n=2622 ec=1 les/c 247/156
> 379/379/379) [1,3] r=-1 lpr=379 pi=152-378/14 lcod 0'0 inactive NOTIFY]
> handle_advance_map [1,3]/[1,3]
> 2013-01-19 19:31:39.975505 7f50de7fc700 10 osd.2 pg_epoch: 417 pg[0.fc( v
> 164'38593 (164'37592,164'38593] local-les=247 n=2622 ec=1 les/c 247/156
> 379/379/379) [1,3] r=-1 lpr=379 pi=152-378/14 lcod 0'0 inactive NOTIFY]
> state<Reset>: Reset advmap
> 2013-01-19 19:31:39.975513 7f50de7fc700 10
> 
> 
> The stack trace from the core file shows:
> 
> Program terminated with signal 6, Aborted.
> #0  0x000000360de0eebb in raise () from /lib64/libpthread.so.0
> Missing separate debuginfos, use: debuginfo-install
> boost-thread-1.48.0-13.fc17.x86_64 glibc-2.15-57.fc17.x86_64
> libaio-0.3.109-5.fc17.x86_64 libgcc-4.7.2-2.fc17.x86_64
> libstdc++-4.7.2-2.fc17.x86_64 libuuid-2.21.2-2.fc17.x86_64
> nspr-4.9.2-1.fc17.x86_64 nss-3.13.5-1.fc17.x86_64
> nss-softokn-3.13.5-1.fc17.x86_64 nss-softokn-freebl-3.13.5-1.fc17.x86_64
> nss-util-3.13.5-1.fc17.x86_64 sqlite-3.7.11-3.fc17.x86_64
> (gdb) bt
> #0  0x000000360de0eebb in raise () from /lib64/libpthread.so.0
> #1  0x000000000082f7a6 in reraise_fatal (signum=6) at
> global/signal_handler.cc:58
> #2  handle_fatal_signal (signum=6) at global/signal_handler.cc:104
> #3  <signal handler called>
> #4  0x000000360d635925 in raise () from /lib64/libc.so.6
> #5  0x000000360d6370d8 in abort () from /lib64/libc.so.6
> #6  0x0000003611660dad in __gnu_cxx::__verbose_terminate_handler() () from
> /lib64/libstdc++.so.6
> #7  0x000000361165eea6 in ?? () from /lib64/libstdc++.so.6
> #8  0x000000361165eed3 in std::terminate() () from /lib64/libstdc++.so.6
> #9  0x000000361165f0fe in __cxa_throw () from /lib64/libstdc++.so.6
> #10 0x00000000008d5edd in ceph::__ceph_assert_fail (assertion=0x99b1b8
> "exists(osd)", file=<optimized out>, line=367, func=0x99fa20 "const epoch_t&
> OSDMap::get_up_thru(int) const") at common/assert.cc:77
> #11 0x000000000060db42 in OSDMap::get_up_thru (osd=<optimized out>,
> this=<optimized out>) at osd/OSDMap.h:367
> #12 0x00000000006e3b35 in OSDMap::get_up_thru (this=<optimized out>,
> osd=<optimized out>) at osd/OSDMap.h:369
> #13 0x0000000000935590 in pg_interval_t::check_new_interval (old_acting=...,
> new_acting=..., old_up=..., new_up=..., same_interval_since=553,
> last_epoch_clean=425, osdmap=std::tr1::shared_ptr (count 83, weak 1)
> 0x2d59530,
>     lastmap=std::tr1::shared_ptr (count 59, weak 1) 0x2e85650, pool_id=0,
> pgid=..., past_intervals=0xc62ef78, out=0x0) at osd/osd_types.cc:1537
> #14 0x00000000007563c3 in PG::start_peering_interval
> (this=this@entry=0xc62e880, lastmap=std::tr1::shared_ptr (count 59, weak 1)
> 0x2e85650, newup=std::vector of length 2, capacity 2 = {...},
> newacting=std::vector of length 3, capacity 3 = {...}) at osd/PG.cc:4624
> #15 0x000000000075887e in PG::RecoveryState::Reset::react
> (this=this@entry=0x9581270, advmap=...) at osd/PG.cc:5241
> #16 0x000000000078abb6 in react<PG::RecoveryState::Reset,
> boost::statechart::event_base, void const*> (evt=..., stt=...,
> eventType=<optimized out>) at
> /usr/include/boost/statechart/custom_reaction.hpp:42
> #17 boost::statechart::simple_state<PG::RecoveryState::Reset,
> PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>,
> boost::statechart::custom_reaction<PG::ActMap>,
> boost::statechart::custom_reaction<PG::NullEvt>,
> boost::statechart::custom_reaction<PG::FlushedEvt>,
> boost::statechart::transition<boost::statechart::event_base,
> PG::RecoveryState::Crashed,
> boost::statechart::detail::no_context<boost::statechart::event_base>,
> &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function>
> >, boost::statechart::simple_state<PG::RecoveryState::Reset,
> PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >
> (stt=..., evt=..., eventType=eventType@entry=0xcdc260) at
> /usr/include/boost/statechart/simple_state.hpp:816
> #18 0x000000000078ac33 in
> local_react<boost::mpl::list5<boost::statechart::custom_reaction<PG::AdvMap>,
> boost::statechart::custom_reaction<PG::ActMap>,
> boost::statechart::custom_reaction<PG::NullEvt>,
> boost::statechart::custom_reaction<PG::FlushedEvt>,
> boost::statechart::transition<boost::statechart::event_base,
> PG::RecoveryState::Crashed> > > (eventType=0xcdc260, evt=..., this=0x9581270)
> at /usr/include/boost/statechart/simple_state.hpp:851
> #19 boost::statechart::simple_state<PG::RecoveryState::Reset,
> PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<PG::QueryState>,
> boost::statechart::custom_reaction<PG::AdvMap>,
> boost::statechart::custom_reaction<PG::ActMap>,
> boost::statechart::custom_reaction<PG::NullEvt>,
> boost::statechart::custom_reaction<PG::FlushedEvt>,
> boost::statechart::transition<boost::statechart::event_base,
> PG::RecoveryState::Crashed,
> boost::statechart::detail::no_context<boost::statechart::event_base>,
> &boost::statechart::detail::no_context<boost::statechart::event_base>::no_function>,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> boost::statechart::simple_state<PG::RecoveryState::Reset,
> PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >
> (stt=..., evt=..., eventType=0xcdc260) at
> /usr/include/boost/statechart/simple_state.hpp:820
> #20 0x000000000076f58b in operator() (this=<synthetic pointer>) at
> /usr/include/boost/statechart/state_machine.hpp:87
> #21
> operator()<boost::statechart::detail::send_function<boost::statechart::detail::state_base<std::allocator<void>,
> boost::statechart::detail::rtti_policy>, boost::statechart::event_base, const
> void*>, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> PG::RecoveryState::Initial>::exception_event_handler> (action=...,
> this=<optimized out>) at
> /usr/include/boost/statechart/null_exception_translator.hpp:33
> #22 boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> PG::RecoveryState::Initial, std::allocator<void>,
> boost::statechart::null_exception_translator>::send_event (this=0xc62fb50,
> evt=...) at /usr/include/boost/statechart/state_machine.hpp:885
> #23 0x000000000076f619 in
> boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> PG::RecoveryState::Initial, std::allocator<void>,
> boost::statechart::null_exception_translator>::process_event
> (this=this@entry=0xc62fb50, evt=...)
>     at /usr/include/boost/statechart/state_machine.hpp:275
> #24 0x000000000076f6cd in PG::RecoveryState::handle_event (this=0xc62fb50,
> evt=..., rctx=0x7f50ddffaa70) at osd/PG.h:1682
> #25 0x000000000072bf46 in PG::handle_advance_map (this=0xc62e880,
> osdmap=std::tr1::shared_ptr (count 83, weak 1) 0x2d59530,
> lastmap=std::tr1::shared_ptr (count 59, weak 1) 0x2e85650, newup=std::vector
> of length 2, capacity 2 = {...},
>     newacting=std::vector of length 3, capacity 4 = {...},
> rctx=0x7f50ddffaa70) at osd/PG.cc:5050
> #26 0x00000000006cf14b in OSD::advance_pg (this=this@entry=0x2a27640,
> osd_epoch=760, pg=pg@entry=0xc62e880, rctx=rctx@entry=0x7f50ddffaa70,
> new_pgs=new_pgs@entry=0x7f50ddffaa40) at osd/OSD.cc:4042
> Python Exception <type 'exceptions.IndexError'> list index out of range:
> #27 0x00000000006cf7f6 in OSD::process_peering_events (this=0x2a27640,
> pgs=std::list) at osd/OSD.cc:6170
> Python Exception <type 'exceptions.IndexError'> list index out of range:
> #28 0x000000000070a3f7 in OSD::PeeringWQ::_process (this=<optimized out>,
> pgs=std::list) at osd/OSD.h:718
> #29 0x00000000008ccccc in ThreadPool::worker (this=0x2a27a88, wt=0x5cd2cd0) at
> common/WorkQueue.cc:113
> #30 0x00000000008cdc40 in ThreadPool::WorkThread::entry (this=<optimized out>)
> at common/WorkQueue.h:288
> #31 0x000000360de07d14 in start_thread () from /lib64/libpthread.so.0
> #32 0x000000360d6f167d in clone () from /lib64/libc.so.6
> 
> 
> Do you want a full copy of the log file?
> 
> It generated 128 MB of logs in those seconds.
> 
> -- 
> Jens Kristian S?gaard, Mermaid Consulting ApS,
> jens@xxxxxxxxxxxxxxxxxxxx,
> http://www.mermaidconsulting.com/
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux