Re: Ceph Luminous - OSD constantly crashing caused by corrupted placement group

Siegfried Höllrigl <siegfried.hoellrigl@xxxxxxxxxx> · Thu, 17 May 2018 08:03:05 +0200



    Am 17.05.2018 um 00:12 schrieb Gregory Farnum:

    
          I'm a bit confused. Are you saying that
          1) the ceph-objectstore-tool you pasted there
            successfully removed pg 5.9b from osd.130 (as it appears),
            AND
        
      
    Yes. The process ceph-osd for osd.130 was not runnin in that phase.

    
          2) pg 5.9b was active with one of the other OSDs as
            primary, so all data remained available, AND
        
      
    Yes. pg 5.9b is active all of the time (on two other OSDs). I think
    OSD.19 is the primary for that pg.

    "ceph pg 5.9b query" thells me :

    .....

        "up": [

            19,

            166

        ],

        "acting": [

            19,

            166

        ],

        "actingbackfill": [

            "19",

            "166"

        ],

    ....

    
          3) when pg 5.9b got backfilled into osd.130, osd.130
            crashed again? (But the other OSDs kept the PG fully
            available, without crashing?)
        
      
    Yes.

    
    It crashes again with the following lines in the osd log :

        -2> 2018-05-16 11:11:59.639980 7fe812ffd700  5 --
    10.7.2.141:6800/173031 >> 10.7.2.49:6836/3920
    conn(0x5619ed76c000 :-1
    s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=24047 cs=1 l=0).
    rx osd.19 seq 24 0x5619eebd6d00 pg_backfill(progress 5.9b e
    505567/505567 lb
    5:d97d84eb:::rbd_data.112913b238e1f29.0000000000000ba3:56c06) v3

        -1> 2018-05-16 11:11:59.639995 7fe812ffd700  1 --
    10.7.2.141:6800/173031 <== osd.19 10.7.2.49:6836/3920 24 ====
    pg_backfill(progress 5.9b e 505567/505567 lb
    5:d97d84eb:::rbd_data.112913b238e1f29.0000000000000ba3:56c06) v3
    ==== 955+0+0 (3741758263 0 0) 0x5619eebd6d00 con 0x5619ed76c000

         0> 2018-05-16 11:11:59.645952 7fe7fe7eb700 -1
    /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: In function 'virtual
    void PrimaryLogPG::on_local_recover(const hobject_t&, const
    ObjectRecoveryInfo&, ObjectContextRef, bool,
    ObjectStore::Transaction*)' thread 7fe7fe7eb700 time 2018-05-16
    11:11:59.640238

    /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p !=
    recovery_info.ss.clone_snaps.end())

    
     ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a)
    luminous (stable)

     1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
    const*)+0x102) [0x5619c11b1a02]

     2: (PrimaryLogPG::on_local_recover(hobject_t const&,
    ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>,
    bool, ObjectStore::Transaction*)+0xd63) [0x5619c0d1f873]

     3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&,
    PushReplyOp*, ObjectStore::Transaction*)+0x2da) [0x5619c0eb15ca]

     4:
(ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x12e)
    [0x5619c0eb17fe]

     5:
(ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1)
    [0x5619c0ec0d71]

     6:
    (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
    [0x5619c0dcc440]

     7:
    (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
    ThreadPool::TPHandle&)+0x543) [0x5619c0d30853]

     8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
    boost::intrusive_ptr<OpRequest>,
    ThreadPool::TPHandle&)+0x3a9) [0x5619c0ba7539]

     9:
    (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
    const&)+0x57) [0x5619c0e50f37]

     10: (OSD::ShardedOpWQ::_process(unsigned int,
    ceph::heartbeat_handle_d*)+0x1047) [0x5619c0bd5847]

     11: (ShardedThreadPool::shardedthreadpool_worker(unsigned
    int)+0x884) [0x5619c11b67f4]

     12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
    [0x5619c11b9830]

     13: (()+0x76ba) [0x7fe8173746ba]

     14: (clone()+0x6d) [0x7fe8163eb41d]

     NOTE: a copy of the executable, or `objdump -rdS
    <executable>` is needed to interpret this.

    
          That sequence of events is *deeply* confusing and I
            really don't understand how it might happen.
          

          Sadly I don't think you can grab a PG for export without
            stopping the OSD in question.
           
          
            When we query the pg, we can see a lot of "snap_trimq".

            Can this be cleaned somehow, even if the pg is undersized
            and degraded ?
          

          I *think* the PG will keep trimming snapshots even if
            undersized+degraded (though I don't remember for sure), but
            snapshot trimming is often heavily throttled and I'm not
            aware of any way to specifically push one PG to the front.
            If you're interested in speeding snaptrimming up you can
            search the archives or check the docs for the appropriate
            config options.
          -Greg
        
      
    Ok. I think we should try that next.

    
    Thank you !

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com