Re: ceph-osd crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 6, 2017 at 7:34 PM, Wyllys Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
> Jewel 10.2.7 on Ubuntu 16.04.2,
>
> I have an OSD that keeps going down, these are the messages in the
> log.  Is this a known bug?

I'm not seeing anything obviously related to this when searching my
archives. If the OSD is still in this state, you should be able to get
more useful info out of it by setting "debug osd = 20" in the log and
starting it up — in particular it should print out the actual assert
which is failing (or it maybe is already dumping and you didn't copy
it?).

If you can create a bug and use ceph-post-file to upload the log
that'd help diagnosing the issue.
-Greg

>
>     -3> 2017-09-06 22:33:14.509181 7f7e18429700  5 osd.16 pg_epoch:
> 107818 pg[9.cd( v 35684'19200 (35250'16102,35684'19200]
> local-les=107801 n=17 ec=1076 les/c/f 107801/107802/0
> 107816/107818/107818) [16,1,65] r=0 lpr=107818 pi=107484-107817/30
> crt=35684'19200 lcod 0'0 mlcod 0'0 peering] exit
> Started/Primary/Peering/GetMissing 0.000008 0 0.000000
>     -2> 2017-09-06 22:33:14.509193 7f7e18429700  5 osd.16 pg_epoch:
> 107818 pg[9.cd( v 35684'19200 (35250'16102,35684'19200]
> local-les=107801 n=17 ec=1076 les/c/f 107801/107802/0
> 107816/107818/107818) [16,1,65] r=0 lpr=107818 pi=107484-107817/30
> crt=35684'19200 lcod 0'0 mlcod 0'0 peering] enter
> Started/Primary/Peering/WaitUpThru
>     -1> 2017-09-06 22:33:14.525481 7f7e9cb37700  1 leveldb: Level-0
> table #803535: 2374544 bytes OK
>      0> 2017-09-06 22:33:14.526739 7f7e19c2c700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f7e19c2c700 thread_name:tp_osd
>
>  ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>  1: (()+0x9770ae) [0x56326b73d0ae]
>  2: (()+0x11390) [0x7f7ea9c64390]
>  3: (gsignal()+0x38) [0x7f7ea7c02428]
>  4: (abort()+0x16a) [0x7f7ea7c0402a]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x26b) [0x56326b83d54b]
>  6: (PGLog::merge_log(ObjectStore::Transaction&, pg_info_t&,
> pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&,
> bool&)+0x1a12) [0x56326b3f5ff2]
>  7: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&,
> pg_shard_t)+0xcd) [0x56326b2034ad]
>  8: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&,
> pg_log_t&, pg_missing_t&, pg_shard_t)+0xc8) [0x56326b20d948]
>  9: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog
> const&)+0x1e7) [0x56326b22faf7]
>  10: (boost::statechart::simple_state<PG::RecoveryState::GetLog,
> PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> const&, void const*)+0x1fe) [0x56326b271d1e]
>  11: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> PG::RecoveryState::Initial, std::allocator<void>,
> boost::statechart::null_exception_translator>::process_queued_events()+0x131)
> [0x56326b24f891]
>  12: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> PG::RecoveryState::Initial, std::allocator<void>,
> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
> const&)+0xee) [0x56326b24fdce]
>  13: (PG::handle_peering_event(std::shared_ptr<PG::CephPeeringEvt>,
> PG::RecoveryCtx*)+0x395) [0x56326b223025]
>  14: (OSD::process_peering_events(std::__cxx11::list<PG*,
> std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x2d4)
> [0x56326b16fdf4]
>  15: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
> ThreadPool::TPHandle&)+0x25) [0x56326b1b88e5]
>  16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb1) [0x56326b82f531]
>  17: (ThreadPool::WorkThread::entry()+0x10) [0x56326b830630]
>  18: (()+0x76ba) [0x7f7ea9c5a6ba]
>  19: (clone()+0x6d) [0x7f7ea7cd382d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux