And please also share the output of ceph osd dump | egrep '^flags|^require_osd_release|^pool' (mask the pool names if they contain private info) -- dan -- Dan On Wed, Jan 12, 2022 at 2:47 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > Thanks David and Marius for your input. > > Can you share which versions of Ceph you've experienced this with? > And what is the use-case: RGW, CephFS, RBD, mixed ... ? > Is there anything special about your ceph usage ? small/large files? > num osds ? Num PGs ? When was the cluster created? > > Thanks! > > dan > > On Wed, Jan 12, 2022 at 6:47 AM Marius Leustean <marius.leus@xxxxxxxxx> wrote: > > > > Had the same issue after a pg_num increase. > > > > Indeed the convenient solution was to add the needed memory (either a Swap partition or physical RAM). > > Things will get back to normal after the initial start, you won’t have to keep that extra ram into your storage nodes. > > > > This is a really annoying issue, and I never found a proper fix to it. You may run into this when ceph decides to change the pg_num. I hope it will be fixed soon. > > > > On Wed, 12 Jan 2022 at 04:44, David Yang <gmydw1118@xxxxxxxxx> wrote: > >> > >> Hi, I have also encountered this problem before, I did not do other > >> operations, just added a ssd as large as possible to create a swap > >> partition. > >> > >> At the most when osd is restored, a storage node uses up 2T of swap. Then > >> after the osd boots back to normal, the memory will be released and return > >> to normal usage. > >> > >> Dan van der Ster <dan@xxxxxxxxxxxxxx> 于2022年1月12日周三 02:14写道: > >> > >> > Hi, > >> > > >> > It sounds like https://tracker.ceph.com/issues/53729 > >> > > >> > -- Dan > >> > > >> > > >> > On Tue., Jan. 11, 2022, 18:32 Konstantin Larin, <klarin@xxxxxxxxxxxxxxxxxx > >> > > > >> > wrote: > >> > > >> > > Hi all, > >> > > > >> > > We have a problem with our 3 node all-in-one cluster (15.2.15). > >> > > > >> > > There are 16 OSDs on each node, 16 HDDs for data and 4 SSDs for DB. > >> > > > >> > > At some point 2 nodes suffered simultaneous power outage with another > >> > > subsequent power outage on one of these nodes. Power outage lasted about > >> > > an hour. This seemingly also triggered autoscaler that reduced pg_num on > >> > > busy pools. > >> > > > >> > > After that there were numerous OOMs that killed OSDs, even on remaining > >> > > node. > >> > > > >> > > We tried to start OSDs one by one, but starting any single OSD results > >> > > in it going into a loop using almost 200 GB of RAM and then aborting > >> > > itself, then restarting by systemd. > >> > > > >> > > The largest numbers in dump_mempools shortly before crash are > >> > > buffer_anon which is over 100 GB and osd_pglog which is about 10 GB. > >> > > > >> > > We have tried manual compaction of OSD DB (ceph-kvstore-tool) and > >> > > trimming PG log (ceph-objectstore-tool). This has not changed anything. > >> > > > >> > > OSDs have common traceback after aborting: > >> > > > >> > > 0> 2022-01-11T11:11:13.644+0200 7f52f01b3700 -1 *** Caught signal > >> > > (Aborted) ** > >> > > in thread 7f52f01b3700 thread_name:tp_osd_tp > >> > > > >> > > ceph version 15.2.15 (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) > >> > > octopus (stable) > >> > > 1: (()+0x12c20) [0x7f5313954c20] > >> > > 2: (gsignal()+0x10f) [0x7f53125b337f] > >> > > 3: (abort()+0x127) [0x7f531259ddb5] > >> > > 4: (()+0x9009b) [0x7f5312f6b09b] > >> > > 5: (()+0x9653c) [0x7f5312f7153c] > >> > > 6: (()+0x96597) [0x7f5312f71597] > >> > > 7: (()+0x967f8) [0x7f5312f717f8] > >> > > 8: (ceph::buffer::v15_2_0::list::refill_append_space(unsigned > >> > > int)+0x200) [0x561cfe88f3a0] > >> > > 9: (ceph::buffer::v15_2_0::list::append_hole(unsigned int)+0x8b) > >> > > [0x561cfe88f69b] > >> > > 10: (pg_log_dup_t::encode(ceph::buffer::v15_2_0::list&) const+0x38) > >> > > [0x561cfe2a1a28] > >> > > 11: (PGLog::_write_log_and_missing(ceph::os::Transaction&, > >> > > std::map<std::__cxx11::basic_string<char, std::char_traits<char>, > >> > > std::allocator<char> >, ceph::buffer::v15_2_0::list, > >> > > std::less<std::__cxx11::basic_string<char, std::char_trait > >> > > s<char>, std::allocator<char> > >, > >> > > std::allocator<std::pair<std::__cxx11::basic_string<char, > >> > > std::char_traits<char>, std::allocator<char> > const, > >> > > ceph::buffer::v15_2_0::list> > >*, pg_log_t&, coll_t const&, ghobject_t > >> > > const&, eversion_t > >> > > , eversion_t, eversion_t, std::set<eversion_t, std::less<eversion_t>, > >> > > std::allocator<eversion_t> >&&, > >> > > std::set<std::__cxx11::basic_string<char, std::char_traits<char>, > >> > > std::allocator<char> >, std::less<std::__cxx11::basic_string<char, st > >> > > d::char_traits<char>, std::allocator<char> > >, > >> > > std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, > >> > > std::allocator<char> > > >&&, pg_missing_set<true> const&, bool, bool, > >> > > bool, eversion_t, eversion_t, eversion_t, bool > >> > > *, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, > >> > > std::allocator<char> >, std::less<std::__cxx11::basic_string<char, > >> > > std::char_traits<char>, std::allocator<char> > >, > >> > > std::allocator<std::__cxx11::basic_string<char, std > >> > > ::char_traits<char>, std::allocator<char> > > >*)+0xd7c) [0x561cfe1719ec] > >> > > 12: (PGLog::write_log_and_missing(ceph::os::Transaction&, > >> > > std::map<std::__cxx11::basic_string<char, std::char_traits<char>, > >> > > std::allocator<char> >, ceph::buffer::v15_2_0::list, > >> > > std::less<std::__cxx11::basic_string<char, std::char_traits > >> > > <char>, std::allocator<char> > >, > >> > > std::allocator<std::pair<std::__cxx11::basic_string<char, > >> > > std::char_traits<char>, std::allocator<char> > const, > >> > > ceph::buffer::v15_2_0::list> > >*, coll_t const&, ghobject_t const&, > >> > > bool)+0x132) [0x561cfe > >> > > 17c2c2] > >> > > 13: (PG::prepare_write(pg_info_t&, pg_info_t&, PastIntervals&, PGLog&, > >> > > bool, bool, bool, ceph::os::Transaction&)+0x1a6) [0x561cfe130226] > >> > > 14: (PeeringState::write_if_dirty(ceph::os::Transaction&)+0x70) > >> > > [0x561cfe305ce0] > >> > > 15: (OSD::split_pgs(PG*, std::set<spg_t, std::less<spg_t>, > >> > > std::allocator<spg_t> > const&, std::set<boost::intrusive_ptr<PG>, > >> > > std::less<boost::intrusive_ptr<PG> >, > >> > > std::allocator<boost::intrusive_ptr<PG> > >*, std::shared_ptr<OSDMap con > >> > > st>, std::shared_ptr<OSDMap const>, PeeringCtx&)+0x57b) [0x561cfe095abb] > >> > > 16: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, > >> > > PeeringCtx&)+0x70c) [0x561cfe0c2e5c] > >> > > 17: (OSD::dequeue_peering_evt(OSDShard*, PG*, > >> > > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xa4) > >> > > [0x561cfe0c4bb4] > >> > > 18: (ceph::osd::scheduler::PGPeeringItem::run(OSD*, OSDShard*, > >> > > boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x56) [0x561cfe2f6c76] > >> > > 19: (OSD::ShardedOpWQ::_process(unsigned int, > >> > > ceph::heartbeat_handle_d*)+0x12ef) [0x561cfe0b7a5f] > >> > > 20: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) > >> > > [0x561cfe6f6204] > >> > > 21: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) > >> > [0x561cfe6f8e64] > >> > > 22: (()+0x817a) [0x7f531394a17a] > >> > > 23: (clone()+0x43) [0x7f5312678dc3] > >> > > > >> > > There is also a strange thing that ceph osd tree reports some OSDs are > >> > > up, when in fact they are not running. > >> > > > >> > > Could you please suggest on this issue? > >> > > > >> > > Regards, > >> > > Konstantin > >> > > _______________________________________________ > >> > > ceph-users mailing list -- ceph-users@xxxxxxx > >> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx