Re: OSDs use 200GB RAM and crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We had the exact same issue last week, in the end unless the dataset can
fit in memory it will never boot..

To be honest this bug seems to being seen by quite a few, in our case it
happened after a PGNUM change on a pool..

In the end I had to manually export the PG's from the OSD, ad them back
into another offline OSD and mark the failed OSD as lost.

On Tue, 11 Jan 2022 at 18:14, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:

> Hi,
>
> It sounds like https://tracker.ceph.com/issues/53729
>
> -- Dan
>
>
> On Tue., Jan. 11, 2022, 18:32 Konstantin Larin, <klarin@xxxxxxxxxxxxxxxxxx
> >
> wrote:
>
> > Hi all,
> >
> > We have a problem with our 3 node all-in-one cluster (15.2.15).
> >
> > There are 16 OSDs on each node, 16 HDDs for data and 4 SSDs for DB.
> >
> > At some point 2 nodes suffered simultaneous power outage with another
> > subsequent power outage on one of these nodes. Power outage lasted about
> > an hour. This seemingly also triggered autoscaler that reduced pg_num on
> > busy pools.
> >
> > After that there were numerous OOMs that killed OSDs, even on remaining
> > node.
> >
> > We tried to start OSDs one by one, but starting any single OSD results
> > in it going into a loop using almost 200 GB of RAM and then aborting
> > itself, then restarting by systemd.
> >
> > The largest numbers in dump_mempools shortly before crash are
> > buffer_anon which is over 100 GB and osd_pglog which is about 10 GB.
> >
> > We have tried manual compaction of OSD DB (ceph-kvstore-tool) and
> > trimming PG log (ceph-objectstore-tool). This has not changed anything.
> >
> > OSDs have common traceback after aborting:
> >
> >       0> 2022-01-11T11:11:13.644+0200 7f52f01b3700 -1 *** Caught signal
> > (Aborted) **
> >   in thread 7f52f01b3700 thread_name:tp_osd_tp
> >
> >   ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985)
> > octopus (stable)
> >   1: (()+0x12c20) [0x7f5313954c20]
> >   2: (gsignal()+0x10f) [0x7f53125b337f]
> >   3: (abort()+0x127) [0x7f531259ddb5]
> >   4: (()+0x9009b) [0x7f5312f6b09b]
> >   5: (()+0x9653c) [0x7f5312f7153c]
> >   6: (()+0x96597) [0x7f5312f71597]
> >   7: (()+0x967f8) [0x7f5312f717f8]
> >   8: (ceph::buffer::v15_2_0::list::refill_append_space(unsigned
> > int)+0x200) [0x561cfe88f3a0]
> >   9: (ceph::buffer::v15_2_0::list::append_hole(unsigned int)+0x8b)
> > [0x561cfe88f69b]
> >   10: (pg_log_dup_t::encode(ceph::buffer::v15_2_0::list&) const+0x38)
> > [0x561cfe2a1a28]
> >   11: (PGLog::_write_log_and_missing(ceph::os::Transaction&,
> > std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, ceph::buffer::v15_2_0::list,
> > std::less<std::__cxx11::basic_string<char, std::char_trait
> > s<char>, std::allocator<char> > >,
> > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const,
> > ceph::buffer::v15_2_0::list> > >*, pg_log_t&, coll_t const&, ghobject_t
> > const&, eversion_t
> > , eversion_t, eversion_t, std::set<eversion_t, std::less<eversion_t>,
> > std::allocator<eversion_t> >&&,
> > std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, std::less<std::__cxx11::basic_string<char, st
> > d::char_traits<char>, std::allocator<char> > >,
> > std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> > > >&&, pg_missing_set<true> const&, bool, bool,
> > bool, eversion_t, eversion_t, eversion_t, bool
> > *, std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > >,
> > std::allocator<std::__cxx11::basic_string<char, std
> > ::char_traits<char>, std::allocator<char> > > >*)+0xd7c) [0x561cfe1719ec]
> >   12: (PGLog::write_log_and_missing(ceph::os::Transaction&,
> > std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >, ceph::buffer::v15_2_0::list,
> > std::less<std::__cxx11::basic_string<char, std::char_traits
> > <char>, std::allocator<char> > >,
> > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > std::char_traits<char>, std::allocator<char> > const,
> > ceph::buffer::v15_2_0::list> > >*, coll_t const&, ghobject_t const&,
> > bool)+0x132) [0x561cfe
> > 17c2c2]
> >   13: (PG::prepare_write(pg_info_t&, pg_info_t&, PastIntervals&, PGLog&,
> > bool, bool, bool, ceph::os::Transaction&)+0x1a6) [0x561cfe130226]
> >   14: (PeeringState::write_if_dirty(ceph::os::Transaction&)+0x70)
> > [0x561cfe305ce0]
> >   15: (OSD::split_pgs(PG*, std::set<spg_t, std::less<spg_t>,
> > std::allocator<spg_t> > const&, std::set<boost::intrusive_ptr<PG>,
> > std::less<boost::intrusive_ptr<PG> >,
> > std::allocator<boost::intrusive_ptr<PG> > >*, std::shared_ptr<OSDMap con
> > st>, std::shared_ptr<OSDMap const>, PeeringCtx&)+0x57b) [0x561cfe095abb]
> >   16: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
> > PeeringCtx&)+0x70c) [0x561cfe0c2e5c]
> >   17: (OSD::dequeue_peering_evt(OSDShard*, PG*,
> > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4)
> > [0x561cfe0c4bb4]
> >   18: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*,
> > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x561cfe2f6c76]
> >   19: (OSD::ShardedOpWQ::_process(unsigned int,
> > ceph::heartbeat_handle_d*)+0x12ef) [0x561cfe0b7a5f]
> >   20: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
> > [0x561cfe6f6204]
> >   21: (ShardedThreadPool::WorkThreadSharded::entry()+0x14)
> [0x561cfe6f8e64]
> >   22: (()+0x817a) [0x7f531394a17a]
> >   23: (clone()+0x43) [0x7f5312678dc3]
> >
> > There is also a strange thing that ceph osd tree reports some OSDs are
> > up, when in fact they are not running.
> >
> > Could you please suggest on this issue?
> >
> > Regards,
> > Konstantin
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux