Yay! I've managed to get the cluster up and running again. Deep scrub is still fixing inconsistencies. I had to do a depth-first-search in the tree of startup errors. My procedure was the already described one: Find and delete PGs from OSDs which trigger the assertion. I've created a script to automate the seek-and-destroy of the broken PGs. https://gist.github.com/TheJJ/c6be62e612ac4782bd0aa279d8c82197 Cheers JJ On 04/10/2018 18.29, Jonas Jelten wrote: > Hello! > > Unfortunately, our single-node-"Cluster" with 11 ODSs is broken because some ODSs crash when they start peering. > I'm on Ubuntu 18.04 with Ceph Mimic (13.2.2). > > The problem was induced by when RAM was filled up and ODS processes then crashed because of memory allocation failures. > > No weird commands (e.g. force_create_pg) were used on this cluster and it was set up with 13.2.1 initially. > The affected pool seems to be a replicated pool with size=3 and min_size=2 (which haven't been changed). > > Crash log of osd.4 (only the crashed thread): > > 99424: -1577> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty > 99425: -1576> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 queue_want_up_thru want 1433 <= queued 1433, currently 1426 > 99427: -1574> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <> waiting <> > waiting_peering {} > 99428: -1573> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent: > 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 (569'119300,866'122691] > local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433)) features: > 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting 4,10),([1401,1403] acting > 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) queued > 99430: -1571> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <OpQueueItem(3.8 > PGPeeringEvent(epoch_sent: 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v > 866'122691 (569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 > 1433/1433/1433)) features: 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting > 4,10),([1401,1403] acting 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433)> waiting <> > waiting_peering {} > 99433: -1568> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent: > 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 (569'119300,866'122691] > local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433)) features: > 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting 4,10),([1401,1403] acting > 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) pg 0x56013bc87400 > 99437: -1564> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] do_peering_event: epoch_sent: 1433 epoch_requested: > 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 (569'119300,866'122691] > local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433)) features: > 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting 4,10),([1401,1403] acting > 2,10),([1426,1428] acting 2,4)) +create_info > 99440: -1561> 2018-10-04 13:40:11.024 7f3838417700 7 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary>: handle_pg_notify from osd.2 > 99444: -1557> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] got dup osd.2 info 3.8( v 866'122691 > (569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433), > identical to ours > 99445: -1556> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty > 99446: -1555> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 queue_want_up_thru want 1433 <= queued 1433, currently 1426 > 99448: -1553> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <> waiting <> > waiting_peering {} > 99450: -1551> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent: > 1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433) queued > 99456: -1545> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <OpQueueItem(3.8 > PGPeeringEvent(epoch_sent: 1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433)> waiting <> > waiting_peering {} > 99458: -1543> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent: > 1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433) pg 0x56013bc87400 > 99461: -1540> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] do_peering_event: epoch_sent: 1433 epoch_requested: > 1433 MLogRec from 2 +create_info > 99465: -1536> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetLog>: GetLog: > received master log from osd2 > 99467: -1534> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetLog>: leaving GetLog > 99473: -1529> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetLog>: processing > master log > 99477: -1525> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433 > pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] proc_master_log for osd.2: > log((866'122691,866'122691], crt=866'122691) missing(0 may_include_deletes = 1) > 99480: -1522> 2018-10-04 13:40:11.024 7f3838417700 10 merge_log log((866'122691,866'122691], crt=866'122691) from osd.2 > into log((866'124700,866'127774], crt=866'127774) > 101018: -1> 2018-10-04 13:40:11.028 7f3838417700 -1 /home/jj/cephsrc/ceph-13.2.2/src/osd/PGLog.cc: In function 'void > PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread > 7f3838417700 time 2018-10-04 13:40:11.027763 > 101109:2018-10-04 13:40:11.084 7f3838417700 -1 *** Caught signal (Aborted) ** > -1> 2018-10-04 13:40:11.028 7f3838417700 -1 /home/jj/cephsrc/ceph-13.2.2/src/osd/PGLog.cc: In function 'void > PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool > /home/jj/cephsrc/ceph-13.2.2/src/osd/PGLog.cc: 345: FAILED assert(log.head >= olog.tail && olog.head >= log.tail) > > ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f385abdc822] > 2: (()+0x26c9e7) [0x7f385abdc9e7] > 3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1ad0) > [0x5601185c7f50] > 4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x560118525194] > 5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x8b) > [0x560118566abb] > 6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xab) [0x560118566e4b] > 7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na, > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl > 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, > std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x56011858 > 9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, > std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_ba > 10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x143) [0x560118573a53] > 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xcf) > [0x5601184b450f] > 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x560118720820] > 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x598) [0x5601184c4148] > 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7f385abe175e] > 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f385abe37e0] > 16: (()+0x76db) [0x7f38592bf6db] > 17: (clone()+0x3f) [0x7f385827188f] > > So it seems the contact to osd.2 for pg 3.8 leads to the crash. > pool 3 'ssn_root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 147 > flags hashpspool stripe_width 0 application cephfs > > The pg dump for 3.8: > 3.8 56570 0 0 0 0 0 3074 3074 > down 2018-10-04 18:21:59.450788 866'127774 2554:129697 [4] 4 > [4] 4 541'94186 2018-10-02 21:57:31.049984 427'34480 2018-09-22 > 18:09:40.485861 0 > > When I stop osd.2, It crashes with: > -20> 2018-10-04 18:20:20.624 7f0168bef700 10 osd.4 pg_epoch: 2540 pg[3.8( v 866'127774 (866'124700,866'127774] > local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 2540/2540/2537) [4,10] r=0 lpr=2540 > pi=[859,2540)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] proc_master_log for osd.10: > log((866'122691,866'122691], crt=866'122691) missing(382 may_include_deletes = 1) > -19> 2018-10-04 18:20:20.624 7f0168bef700 10 merge_log log((866'122691,866'122691], crt=866'122691) from osd.10 into > log((866'124700,866'127774], crt=866'127774) > [same traceback] > > So I stop osd.10. Then osd.4 keeps running. > > pg 3.8 query says this: > "recovery_state": [ > { > "name": "Started/Primary/Peering/Down", > "enter_time": "2018-10-04 18:21:59.450774", > "comment": "not enough up instances of this PG to go active" > }, > { > "name": "Started/Primary/Peering", > "enter_time": "2018-10-04 18:21:59.450460", > "past_intervals": [ > { > "first": "859", > "last": "2550", > "all_participants": [ > { > "osd": 0 > }, > { > "osd": 2 > }, > { > "osd": 4 > }, > { > "osd": 10 > } > ], > [...] > "probing_osds": [ > "4" > ], > "blocked": "peering is blocked due to down osds", > "down_osds_we_would_probe": [ > 0, > 2, > 10 > ], > "peering_blocked_by": [ > { > "osd": 2, > "current_lost_at": 0, > "comment": "starting or marking this osd lost may let us proceed" > }, > { > "osd": 10, > "current_lost_at": 0, > "comment": "starting or marking this osd lost may let us proceed" > } > ] > > > What shall I do? :) > > Cheers, > JJ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com