Re: Ceph Luminous - OSD constantly crashing caused by corrupted placement group

Siegfried Höllrigl <siegfried.hoellrigl@xxxxxxxxxx> · Wed, 23 May 2018 11:10:37 +0200



    Hi !
    We have now deleted all snapshots of the pool in question.
    With "ceph pg dump" we can see that pg 5.9b has a SNAPTRIMQ_LEN
      of 27826.
    All other PGs have 0.

    
    It looks like this value does not decrease. LAST_SCRUB and
      LAST_DEEP_SCRUB  are both from 2018-04-24. Almost 1 month ago.

    
    OSD still crashing a while after we start it. OSD Log :

    
    *** Caught signal (Aborted) **
    and
    /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p
      != recovery_info.ss.clone_snaps.end())

    
    Any Ideas howto fix this ? Is there a way to "force" the snaptrim
      of the pg in question ? Or anyother way to "clean" this pg ?

    
    We have searched a lot in the mail archives but couldnt find
      anything that could help us in that case.
    

    Br,

    
    Am 17.05.2018 um 00:12 schrieb Gregory
      Farnum:

    
      On Wed, May 16, 2018 at 6:49 AM Siegfried Höllrigl
        <siegfried.hoellrigl@xxxxxxxxxx>
        wrote:

        
          Hi Greg !

            
            Thank you for your fast reply.

            
            We have now deleted the PG on OSD.130 like you suggested and
            started it :

            
            ceph-s-06 # ceph-objectstore-tool --data-path 

            /var/lib/ceph/osd/ceph-130/ --pgid 5.9b --op remove --force

              marking collection for removal

            setting '_remove' omap key

            finish_remove_pgs 5.9b_head removing 5.9b

            Remove successful

            ceph-s-06 # systemctl start ceph-osd@130.service

            
            The cluster recovered again until it came to the PG 5.9b.
            Then OSD.130 

            crashed again. -> No Change

            
            So we wanted to start the other way and export the PG from
            the primary 

            (healthy) OSD. (OSD.19) but that fails:

            
            root@ceph-s-03:/tmp5.9b# ceph-objectstore-tool --op export
            --pgid 5.9b 

            --data-path /var/lib/ceph/osd/ceph-19 --file
            /tmp5.9b/5.9b.export

            OSD has the store locked

            
            But we don't want to stop OSD.19 on this server because this
            Pool has 

            size=3 and size_min=2.

            (this would make pg5.9b inaccessable)

          
          I'm a bit confused. Are you saying that
          1) the ceph-objectstore-tool you pasted there
            successfully removed pg 5.9b from osd.130 (as it appears),
            AND
          2) pg 5.9b was active with one of the other nodes as
            primary, so all data remained available, AND
          3) when pg 5.9b got backfilled into osd.130, osd.130
            crashed again? (But the other OSDs kept the PG fully
            available, without crashing?)
          

          That sequence of events is *deeply* confusing and I
            really don't understand how it might happen.
          

          Sadly I don't think you can grab a PG for export without
            stopping the OSD in question.
           
          
            When we query the pg, we can see a lot of "snap_trimq".

            Can this be cleaned somehow, even if the pg is undersized
            and degraded ?
          

          I *think* the PG will keep trimming snapshots even if
            undersized+degraded (though I don't remember for sure), but
            snapshot trimming is often heavily throttled and I'm not
            aware of any way to specifically push one PG to the front.
            If you're interested in speeding snaptrimming up you can
            search the archives or check the docs for the appropriate
            config options.
          -Greg
        
      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com