Re: OSDs use 200GB RAM and crash

Marius Leustean <marius.leus@xxxxxxxxx> · Wed, 12 Jan 2022 07:46:56 +0200

Had the same issue after a pg_num increase.

Indeed the convenient solution was to add the needed memory (either a Swap
partition or physical RAM).
Things will get back to normal after the initial start, you won’t have to
keep that extra ram into your storage nodes.

This is a really annoying issue, and I never found a proper fix to it. You
may run into this when ceph decides to change the pg_num. I hope it will be
fixed soon.

On Wed, 12 Jan 2022 at 04:44, David Yang <gmydw1118@xxxxxxxxx> wrote:

> Hi, I have also encountered this problem before, I did not do other
> operations, just added a ssd as large as possible to create a swap
> partition.
>
> At the most when osd is restored, a storage node uses up 2T of swap. Then
> after the osd boots back to normal, the memory will be released and return
> to normal usage.
>
> Dan van der Ster <dan@xxxxxxxxxxxxxx> 于2022年1月12日周三 02:14写道：
>
> > Hi,
> >
> > It sounds like https://tracker.ceph.com/issues/53729
> >
> > -- Dan
> >
> >
> > On Tue., Jan. 11, 2022, 18:32 Konstantin Larin, <
> klarin@xxxxxxxxxxxxxxxxxx
> > >
> > wrote:
> >
> > > Hi all,
> > >
> > > We have a problem with our 3 node all-in-one cluster (15.2.15).
> > >
> > > There are 16 OSDs on each node, 16 HDDs for data and 4 SSDs for DB.
> > >
> > > At some point 2 nodes suffered simultaneous power outage with another
> > > subsequent power outage on one of these nodes. Power outage lasted
> about
> > > an hour. This seemingly also triggered autoscaler that reduced pg_num
> on
> > > busy pools.
> > >
> > > After that there were numerous OOMs that killed OSDs, even on remaining
> > > node.
> > >
> > > We tried to start OSDs one by one, but starting any single OSD results
> > > in it going into a loop using almost 200 GB of RAM and then aborting
> > > itself, then restarting by systemd.
> > >
> > > The largest numbers in dump_mempools shortly before crash are
> > > buffer_anon which is over 100 GB and osd_pglog which is about 10 GB.
> > >
> > > We have tried manual compaction of OSD DB (ceph-kvstore-tool) and
> > > trimming PG log (ceph-objectstore-tool). This has not changed anything.
> > >
> > > OSDs have common traceback after aborting:
> > >
> > >       0> 2022-01-11T11:11:13.644+0200 7f52f01b3700 -1 *** Caught signal
> > > (Aborted) **
> > >   in thread 7f52f01b3700 thread_name:tp_osd_tp
> > >
> > >   ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985)
> > > octopus (stable)
> > >   1: (()+0x12c20) [0x7f5313954c20]
> > >   2: (gsignal()+0x10f) [0x7f53125b337f]
> > >   3: (abort()+0x127) [0x7f531259ddb5]
> > >   4: (()+0x9009b) [0x7f5312f6b09b]
> > >   5: (()+0x9653c) [0x7f5312f7153c]
> > >   6: (()+0x96597) [0x7f5312f71597]
> > >   7: (()+0x967f8) [0x7f5312f717f8]
> > >   8: (ceph::buffer::v15_2_0::list::refill_append_space(unsigned
> > > int)+0x200) [0x561cfe88f3a0]
> > >   9: (ceph::buffer::v15_2_0::list::append_hole(unsigned int)+0x8b)
> > > [0x561cfe88f69b]
> > >   10: (pg_log_dup_t::encode(ceph::buffer::v15_2_0::list&) const+0x38)
> > > [0x561cfe2a1a28]
> > >   11: (PGLog::_write_log_and_missing(ceph::os::Transaction&,
> > > std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> >, ceph::buffer::v15_2_0::list,
> > > std::less<std::__cxx11::basic_string<char, std::char_trait
> > > s<char>, std::allocator<char> > >,
> > > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> > const,
> > > ceph::buffer::v15_2_0::list> > >*, pg_log_t&, coll_t const&, ghobject_t
> > > const&, eversion_t
> > > , eversion_t, eversion_t, std::set<eversion_t, std::less<eversion_t>,
> > > std::allocator<eversion_t> >&&,
> > > std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> >, std::less<std::__cxx11::basic_string<char, st
> > > d::char_traits<char>, std::allocator<char> > >,
> > > std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> > > >&&, pg_missing_set<true> const&, bool, bool,
> > > bool, eversion_t, eversion_t, eversion_t, bool
> > > *, std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> > >,
> > > std::allocator<std::__cxx11::basic_string<char, std
> > > ::char_traits<char>, std::allocator<char> > > >*)+0xd7c)
> [0x561cfe1719ec]
> > >   12: (PGLog::write_log_and_missing(ceph::os::Transaction&,
> > > std::map<std::__cxx11::basic_string<char, std::char_traits<char>,
> > > std::allocator<char> >, ceph::buffer::v15_2_0::list,
> > > std::less<std::__cxx11::basic_string<char, std::char_traits
> > > <char>, std::allocator<char> > >,
> > > std::allocator<std::pair<std::__cxx11::basic_string<char,
> > > std::char_traits<char>, std::allocator<char> > const,
> > > ceph::buffer::v15_2_0::list> > >*, coll_t const&, ghobject_t const&,
> > > bool)+0x132) [0x561cfe
> > > 17c2c2]
> > >   13: (PG::prepare_write(pg_info_t&, pg_info_t&, PastIntervals&,
> PGLog&,
> > > bool, bool, bool, ceph::os::Transaction&)+0x1a6) [0x561cfe130226]
> > >   14: (PeeringState::write_if_dirty(ceph::os::Transaction&)+0x70)
> > > [0x561cfe305ce0]
> > >   15: (OSD::split_pgs(PG*, std::set<spg_t, std::less<spg_t>,
> > > std::allocator<spg_t> > const&, std::set<boost::intrusive_ptr<PG>,
> > > std::less<boost::intrusive_ptr<PG> >,
> > > std::allocator<boost::intrusive_ptr<PG> > >*, std::shared_ptr<OSDMap
> con
> > > st>, std::shared_ptr<OSDMap const>, PeeringCtx&)+0x57b)
> [0x561cfe095abb]
> > >   16: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
> > > PeeringCtx&)+0x70c) [0x561cfe0c2e5c]
> > >   17: (OSD::dequeue_peering_evt(OSDShard*, PG*,
> > > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4)
> > > [0x561cfe0c4bb4]
> > >   18: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*,
> > > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56)
> [0x561cfe2f6c76]
> > >   19: (OSD::ShardedOpWQ::_process(unsigned int,
> > > ceph::heartbeat_handle_d*)+0x12ef) [0x561cfe0b7a5f]
> > >   20: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
> > > [0x561cfe6f6204]
> > >   21: (ShardedThreadPool::WorkThreadSharded::entry()+0x14)
> > [0x561cfe6f8e64]
> > >   22: (()+0x817a) [0x7f531394a17a]
> > >   23: (clone()+0x43) [0x7f5312678dc3]
> > >
> > > There is also a strange thing that ceph osd tree reports some OSDs are
> > > up, when in fact they are not running.
> > >
> > > Could you please suggest on this issue?
> > >
> > > Regards,
> > > Konstantin
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx