OSDs use 200GB RAM and crash

Konstantin Larin <klarin@xxxxxxxxxxxxxxxxxx> · Tue, 11 Jan 2022 20:31:41 +0300

Hi all,

We have a problem with our 3 node all-in-one cluster (15.2.15).

There are 16 OSDs on each node, 16 HDDs for data and 4 SSDs for DB.

At some point 2 nodes suffered simultaneous power outage with another 
subsequent power outage on one of these nodes. Power outage lasted about 
an hour. This seemingly also triggered autoscaler that reduced pg_num on 
busy pools.

After that there were numerous OOMs that killed OSDs, even on remaining 
node.

We tried to start OSDs one by one, but starting any single OSD results 
in it going into a loop using almost 200 GB of RAM and then aborting 
itself, then restarting by systemd.

The largest numbers in dump_mempools shortly before crash are 
buffer_anon which is over 100 GB and osd_pglog which is about 10 GB.

We have tried manual compaction of OSD DB (ceph-kvstore-tool) and 
trimming PG log (ceph-objectstore-tool). This has not changed anything.

OSDs have common traceback after aborting:

     0> 2022-01-11T11:11:13.644+0200 7f52f01b3700 -1 *** Caught signal 
(Aborted) **
 in thread 7f52f01b3700 thread_name:tp_osd_tp

 ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) 
octopus (stable)
 1: (()+0x12c20) [0x7f5313954c20]
 2: (gsignal()+0x10f) [0x7f53125b337f]
 3: (abort()+0x127) [0x7f531259ddb5]
 4: (()+0x9009b) [0x7f5312f6b09b]
 5: (()+0x9653c) [0x7f5312f7153c]
 6: (()+0x96597) [0x7f5312f71597]
 7: (()+0x967f8) [0x7f5312f717f8]
 8: (ceph::buffer::v15_2_0::list::refill_append_space(unsigned 
int)+0x200) [0x561cfe88f3a0]
 9: (ceph::buffer::v15_2_0::list::append_hole(unsigned int)+0x8b) 
[0x561cfe88f69b]
 10: (pg_log_dup_t::encode(ceph::buffer::v15_2_0::list&) const+0x38) 
[0x561cfe2a1a28]
 11: (PGLog::_write_log_and_missing(ceph::os::Transaction&, 
std::map<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, ceph::buffer::v15_2_0::list, 
std::less<std::__cxx11::basic_string<char, std::char_trait
s<char>, std::allocator<char> > >, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
ceph::buffer::v15_2_0::list> > >*, pg_log_t&, coll_t const&, ghobject_t 
const&, eversion_t
, eversion_t, eversion_t, std::set<eversion_t, std::less<eversion_t>, 
std::allocator<eversion_t> >&&, 
std::set<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::less<std::__cxx11::basic_string<char, st
d::char_traits<char>, std::allocator<char> > >, 
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > > >&&, pg_missing_set<true> const&, bool, bool, 
bool, eversion_t, eversion_t, eversion_t, bool
*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::less<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > >, 
std::allocator<std::__cxx11::basic_string<char, std
::char_traits<char>, std::allocator<char> > > >*)+0xd7c) [0x561cfe1719ec]
 12: (PGLog::write_log_and_missing(ceph::os::Transaction&, 
std::map<std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, ceph::buffer::v15_2_0::list, 
std::less<std::__cxx11::basic_string<char, std::char_traits
<char>, std::allocator<char> > >, 
std::allocator<std::pair<std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const, 
ceph::buffer::v15_2_0::list> > >*, coll_t const&, ghobject_t const&, 
bool)+0x132) [0x561cfe
17c2c2]
 13: (PG::prepare_write(pg_info_t&, pg_info_t&, PastIntervals&, PGLog&, 
bool, bool, bool, ceph::os::Transaction&)+0x1a6) [0x561cfe130226]
 14: (PeeringState::write_if_dirty(ceph::os::Transaction&)+0x70) 
[0x561cfe305ce0]
 15: (OSD::split_pgs(PG*, std::set<spg_t, std::less<spg_t>, 
std::allocator<spg_t> > const&, std::set<boost::intrusive_ptr<PG>, 
std::less<boost::intrusive_ptr<PG> >, 
std::allocator<boost::intrusive_ptr<PG> > >*, std::shared_ptr<OSDMap con
st>, std::shared_ptr<OSDMap const>, PeeringCtx&)+0x57b) [0x561cfe095abb]
 16: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, 
PeeringCtx&)+0x70c) [0x561cfe0c2e5c]
 17: (OSD::dequeue_peering_evt(OSDShard*, PG*, 
std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4) 
[0x561cfe0c4bb4]
 18: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, 
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x561cfe2f6c76]
 19: (OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0x12ef) [0x561cfe0b7a5f]
 20: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) 
[0x561cfe6f6204]
 21: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x561cfe6f8e64]
 22: (()+0x817a) [0x7f531394a17a]
 23: (clone()+0x43) [0x7f5312678dc3]

There is also a strange thing that ceph osd tree reports some OSDs are 
up, when in fact they are not running.

Could you please suggest on this issue?

Regards,
Konstantin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx