Re: ceph-osd fails to start - crash log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The stack trace would indicate that the OSD dies while trying to
allocate memory.

It might potentially be a similar problem to the one described in this
thread: https://www.spinics.net/lists/ceph-devel/msg37961.html so the
same solution could help (upgrading to Luminous). Otherwise apparently
there is a patch floating around which might help reducing memory
usage in this scenario.

Some more details about your cluster would possibly be useful (like
how many nodes, how many OSD per node, size of OSDs, how much RAM what
kind of CPUs, networking setup etc.)


On Sat, Sep 2, 2017 at 4:32 AM, Wyllys Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
> ceph 10.2.7
> Ubuntu 16.04.2
> Kernel: 4.9.44
>
> I have a system in a bad state, and many of the OSDs are failing to
> start, they come up for a little while, then die.  I need some help
> figuring out how to get these OSDs to come up and stay up so my system
> can rebalance itself.
>
> The logs show the following.
>
>
>    -14> 2017-09-01 12:27:32.836207 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47945 pg[26.2a3( empty local-les=46494 n=0 ec=35203 les/c/f
> 47869/47869/0 47889/47896/47896) [39,30,94] r=0 lpr=0
> pi=46430-47895/15 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
>    -13> 2017-09-01 12:27:32.878713 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47899 pg[7.5f7(unlocked)] enter Initial
>    -12> 2017-09-01 12:27:32.910644 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47899 pg[7.5f7( v 29917'81518 (18780'78457,29917'81518]
> local-les=42702 n=11 ec=1511 les/c/f 42702/41354/0 47896/47896/45989)
> [12,39,82]/[12,39] r=1 lpr=0 pi=41345-47895/44 crt=29917'81518 lcod
> 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.031932 0 0.000000
>    -11> 2017-09-01 12:27:32.910684 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47899 pg[7.5f7( v 29917'81518 (18780'78457,29917'81518]
> local-les=42702 n=11 ec=1511 les/c/f 42702/41354/0 47896/47896/45989)
> [12,39,82]/[12,39] r=1 lpr=0 pi=41345-47895/44 crt=29917'81518 lcod
> 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
>    -10> 2017-09-01 12:27:32.934425 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47899 pg[22.637(unlocked)] enter Initial
>     -9> 2017-09-01 12:27:32.934646 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47899 pg[22.637( empty local-les=46401 n=0 ec=19250 les/c/f
> 47869/47869/0 47889/47896/47896) [39,69,35] r=0 lpr=0
> pi=46353-47895/12 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial
> 0.000220 0 0.000000
>     -8> 2017-09-01 12:27:32.934668 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47899 pg[22.637( empty local-les=46401 n=0 ec=19250 les/c/f
> 47869/47869/0 47889/47896/47896) [39,69,35] r=0 lpr=0
> pi=46353-47895/12 crt=0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
>     -7> 2017-09-01 12:27:32.976842 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47922 pg[7.67f(unlocked)] enter Initial
>     -6> 2017-09-01 12:27:33.004614 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47922 pg[7.67f( v 30030'90009 (19559'86971,30030'90009]
> local-les=47002 n=12 ec=1511 les/c/f 47869/47141/0 47889/47893/47893)
> [39,13,41] r=0 lpr=0 pi=47001-47892/5 crt=30030'90009 lcod 0'0 mlcod
> 0'0 inactive NIBBLEWISE] exit Initial 0.027772 0 0.000000
>     -5> 2017-09-01 12:27:33.004650 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47922 pg[7.67f( v 30030'90009 (19559'86971,30030'90009]
> local-les=47002 n=12 ec=1511 les/c/f 47869/47141/0 47889/47893/47893)
> [39,13,41] r=0 lpr=0 pi=47001-47892/5 crt=30030'90009 lcod 0'0 mlcod
> 0'0 inactive NIBBLEWISE] enter Reset
>     -4> 2017-09-01 12:27:33.055420 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47954 pg[7.62d(unlocked)] enter Initial
>     -3> 2017-09-01 12:27:33.128309 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47954 pg[7.62d( v 35215'96652 (18780'93637,35215'96652]
> local-les=47898 n=17 ec=1511 les/c/f 47898/42466/0 47889/47889/47889)
> [39,13,18]/[39,13] r=0 lpr=0 pi=42464-47888/34 crt=35215'96652 lcod
> 0'0 mlcod 0'0 inactive NIBBLEWISE] exit Initial 0.072890 0 0.000000
>     -2> 2017-09-01 12:27:33.128343 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47954 pg[7.62d( v 35215'96652 (18780'93637,35215'96652]
> local-les=47898 n=17 ec=1511 les/c/f 47898/42466/0 47889/47889/47889)
> [39,13,18]/[39,13] r=0 lpr=0 pi=42464-47888/34 crt=35215'96652 lcod
> 0'0 mlcod 0'0 inactive NIBBLEWISE] enter Reset
>     -1> 2017-09-01 12:27:33.144109 7f7ebe62c8c0  5 osd.39 pg_epoch:
> 47889 pg[7.65c(unlocked)] enter Initial
>      0> 2017-09-01 12:27:33.151134 7f7ebe62c8c0 -1 *** Caught signal
> (Aborted) **
>  in thread 7f7ebe62c8c0 thread_name:ceph-osd
>
>  ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
>  1: (()+0x9770ae) [0x511ab2e0ae]
>  2: (()+0x11390) [0x7f7ebd4ea390]
>  3: (gsignal()+0x38) [0x7f7ebb488428]
>  4: (abort()+0x16a) [0x7f7ebb48a02a]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f7ebbdca84d]
>  6: (()+0x8d6b6) [0x7f7ebbdc86b6]
>  7: (()+0x8d701) [0x7f7ebbdc8701]
>  8: (()+0x8d919) [0x7f7ebbdc8919]
>  9: (()+0x1230f) [0x7f7ebe1c230f]
>  10: (operator new[](unsigned long)+0x4e7) [0x7f7ebe1e64b7]
>  11: (void std::__cxx11::list<pg_log_entry_t,
> std::allocator<pg_log_entry_t> >::_M_insert<pg_log_entry_t
> const&>(std::_List_iterator<pg_log_entry_t>, pg_log_entry_t
> const&)+0x21) [0x511a6f7e21]
>  12: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
> pg_info_t const&, std::map<eversion_t, hobject_t,
> std::less<eversion_t>, std::allocator<std::pair<eversion_t const,
> hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
> std::__cxx11::basic_ostringstream<char, std::char_traits<char>,
> std::allocator<char> >&, DoutPrefixProvider const*,
> std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> std::allocator<std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> > > >*)+0xe0c)
> [0x511a7db99c]
>  13: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6) [0x511a60d306]
>  14: (OSD::load_pgs()+0x87a) [0x511a548f0a]
>  15: (OSD::init()+0x2026) [0x511a5541f6]
>  16: (main()+0x2ea5) [0x511a4c5dc5]
>  17: (__libc_start_main()+0xf0) [0x7f7ebb473830]
>  18: (_start()+0x29) [0x511a507459]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    0/ 1 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 rbd_mirror
>    0/ 5 rbd_replay
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 1 ms
>    0/ 1 mon
>    0/10 monc
>    1/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 newstore
>    1/ 5 bluestore
>    1/ 5 bluefs
>    1/ 3 bdev
>    1/ 5 kstore
>    4/ 5 rocksdb
>    4/ 5 leveldb
>    1/ 5 kinetic
>    1/ 5 fuse
>   99/99 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.39.log
> --- end dump of recent events ---
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux