Cluster broken and ODSs crash with failed assertion in PGLog::merge_log

Jonas Jelten <jelten@xxxxxxxxx> · Thu, 4 Oct 2018 18:29:33 +0200

Hello!

Unfortunately, our single-node-"Cluster" with 11 ODSs is broken because some ODSs crash when they start peering.
I'm on Ubuntu 18.04 with Ceph Mimic (13.2.2).

The problem was induced by when RAM was filled up and ODS processes then crashed because of memory allocation failures.

No weird commands (e.g. force_create_pg) were used on this cluster and it was set up with 13.2.1 initially.
The affected pool seems to be a replicated pool with size=3 and min_size=2 (which haven't been changed).

Crash log of osd.4 (only the crashed thread):

99424: -1577> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty
99425: -1576> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 queue_want_up_thru want 1433 <= queued 1433, currently 1426
99427: -1574> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <> waiting <>
waiting_peering {}
99428: -1573> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 (569'119300,866'122691]
local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433)) features:
0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting 4,10),([1401,1403] acting
2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) queued
99430: -1571> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <OpQueueItem(3.8
PGPeeringEvent(epoch_sent: 1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v
866'122691 (569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0
1433/1433/1433)) features: 0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting
4,10),([1401,1403] acting 2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433)> waiting <>
waiting_peering {}
99433: -1568> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 (569'119300,866'122691]
local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433)) features:
0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting 4,10),([1401,1403] acting
2,10),([1426,1428] acting 2,4)) +create_info) prio 255 cost 10 e1433) pg 0x56013bc87400
99437: -1564> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] do_peering_event: epoch_sent: 1433 epoch_requested:
1433 MNotifyRec 3.8 from 2 notify: (query:1433 sent:1433 3.8( v 866'122691 (569'119300,866'122691]
local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433)) features:
0x3ffddff8ffa4fffb ([859,1432] intervals=([1213,1215] acting 0,2),([1308,1311] acting 4,10),([1401,1403] acting
2,10),([1426,1428] acting 2,4)) +create_info
99440: -1561> 2018-10-04 13:40:11.024 7f3838417700  7 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary>: handle_pg_notify from osd.2
99444: -1557> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}]  got dup osd.2 info 3.8( v 866'122691
(569'119300,866'122691] local-lis/les=1401/1402 n=54053 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433),
identical to ours
99445: -1556> 2018-10-04 13:40:11.024 7f3838417700 10 log is not dirty
99446: -1555> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 1433 queue_want_up_thru want 1433 <= queued 1433, currently 1426
99448: -1553> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <> waiting <>
waiting_peering {}
99450: -1551> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433) queued
99456: -1545> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process 3.8 to_process <OpQueueItem(3.8
PGPeeringEvent(epoch_sent: 1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433)> waiting <>
waiting_peering {}
99458: -1543> 2018-10-04 13:40:11.024 7f3838417700 20 osd.4 op_wq(3) _process OpQueueItem(3.8 PGPeeringEvent(epoch_sent:
1433 epoch_requested: 1433 MLogRec from 2 +create_info) prio 255 cost 10 e1433) pg 0x56013bc87400
99461: -1540> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] do_peering_event: epoch_sent: 1433 epoch_requested:
1433 MLogRec from 2 +create_info
99465: -1536> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetLog>: GetLog:
received master log from osd2
99467: -1534> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetLog>: leaving GetLog
99473: -1529> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetLog>: processing
master log
99477: -1525> 2018-10-04 13:40:11.024 7f3838417700 10 osd.4 pg_epoch: 1433 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 1433/1433/1433) [4,2] r=0 lpr=1433
pi=[859,1433)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] proc_master_log for osd.2:
log((866'122691,866'122691], crt=866'122691) missing(0 may_include_deletes = 1)
99480: -1522> 2018-10-04 13:40:11.024 7f3838417700 10 merge_log log((866'122691,866'122691], crt=866'122691) from osd.2
into log((866'124700,866'127774], crt=866'127774)
101018:    -1> 2018-10-04 13:40:11.028 7f3838417700 -1 /home/jj/cephsrc/ceph-13.2.2/src/osd/PGLog.cc: In function 'void
PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)' thread
7f3838417700 time 2018-10-04 13:40:11.027763
101109:2018-10-04 13:40:11.084 7f3838417700 -1 *** Caught signal (Aborted) **
    -1> 2018-10-04 13:40:11.028 7f3838417700 -1 /home/jj/cephsrc/ceph-13.2.2/src/osd/PGLog.cc: In function 'void
PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool
/home/jj/cephsrc/ceph-13.2.2/src/osd/PGLog.cc: 345: FAILED assert(log.head >= olog.tail && olog.head >= log.tail)

 ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f385abdc822]
 2: (()+0x26c9e7) [0x7f385abdc9e7]
 3: (PGLog::merge_log(pg_info_t&, pg_log_t&, pg_shard_t, pg_info_t&, PGLog::LogEntryHandler*, bool&, bool&)+0x1ad0)
[0x5601185c7f50]
 4: (PG::merge_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_shard_t)+0x64) [0x560118525194]
 5: (PG::proc_master_log(ObjectStore::Transaction&, pg_info_t&, pg_log_t&, pg_missing_set<false>&, pg_shard_t)+0x8b)
[0x560118566abb]
 6: (PG::RecoveryState::GetLog::react(PG::RecoveryState::GotLog const&)+0xab) [0x560118566e4b]
 7: (boost::statechart::simple_state<PG::RecoveryState::GetLog, PG::RecoveryState::Peering, boost::mpl::list<mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl
 8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial,
std::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events()+0xb3) [0x56011858
 9: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial,
std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_ba
 10: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x143) [0x560118573a53]
 11: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0xcf)
[0x5601184b450f]
 12: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x560118720820]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x598) [0x5601184c4148]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7f385abe175e]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f385abe37e0]
 16: (()+0x76db) [0x7f38592bf6db]
 17: (clone()+0x3f) [0x7f385827188f]

So it seems the contact to osd.2 for pg 3.8 leads to the crash.
pool 3 'ssn_root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 147
flags hashpspool stripe_width 0 application cephfs

The pg dump for 3.8:
3.8       56570                  0        0         0       0            0 3074     3074
             down 2018-10-04 18:21:59.450788 866'127774 2554:129697                                  [4]          4
                            [4]              4  541'94186 2018-10-02 21:57:31.049984       427'34480 2018-09-22
18:09:40.485861             0

When I stop osd.2, It crashes with:
   -20> 2018-10-04 18:20:20.624 7f0168bef700 10 osd.4 pg_epoch: 2540 pg[3.8( v 866'127774 (866'124700,866'127774]
local-lis/les=859/860 n=56570 ec=126/126 lis/c 1401/859 les/c/f 1402/860/0 2540/2540/2537) [4,10] r=0 lpr=2540
pi=[859,2540)/4 crt=866'127774 lcod 0'0 mlcod 0'0 peering mbc={}] proc_master_log for osd.10:
log((866'122691,866'122691], crt=866'122691) missing(382 may_include_deletes = 1)
   -19> 2018-10-04 18:20:20.624 7f0168bef700 10 merge_log log((866'122691,866'122691], crt=866'122691) from osd.10 into
log((866'124700,866'127774], crt=866'127774)
[same traceback]

So I stop osd.10. Then osd.4 keeps running.

pg 3.8 query says this:
  "recovery_state": [
    {
      "name": "Started/Primary/Peering/Down",
      "enter_time": "2018-10-04 18:21:59.450774",
      "comment": "not enough up instances of this PG to go active"
    },
    {
      "name": "Started/Primary/Peering",
      "enter_time": "2018-10-04 18:21:59.450460",
      "past_intervals": [
        {
          "first": "859",
          "last": "2550",
          "all_participants": [
            {
              "osd": 0
            },
            {
              "osd": 2
            },
            {
              "osd": 4
            },
            {
              "osd": 10
            }
          ],
[...]
      "probing_osds": [
        "4"
      ],
      "blocked": "peering is blocked due to down osds",
      "down_osds_we_would_probe": [
        0,
        2,
        10
      ],
      "peering_blocked_by": [
        {
          "osd": 2,
          "current_lost_at": 0,
          "comment": "starting or marking this osd lost may let us proceed"
        },
        {
          "osd": 10,
          "current_lost_at": 0,
          "comment": "starting or marking this osd lost may let us proceed"
        }
      ]

What shall I do? :)

Cheers,
JJ
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com