Re: Major ceph disaster

Kevin Flöh <kevin.floeh@xxxxxxx> · Thu, 23 May 2019 10:50:43 +0200



    Hi,
    we have set the PGs to recover and now they are stuck in
      active+recovery_wait+degraded and instructing them to deep-scrub
      does not change anything. Hence, the rados report is empty. Is
      there a way to stop the recovery wait to start the deep-scrub and
      get the output? I guess the recovery_wait might be caused by
      missing objects. Do we need to delete them first to get the
      recovery going?

    
    Kevin

    
    On 22.05.19 6:03 nachm., Robert LeBlanc
      wrote:

    
        On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.floeh@xxxxxxx>
          wrote:

        
          Hi,

            
            thank you, it worked. The PGs are not incomplete anymore.
            Still we have 

            another problem, there are 7 PGs inconsistent and a cpeh pg
            repair is 

            not doing anything. I just get "instructing pg 1.5dd on
            osd.24 to 

            repair" and nothing happens. Does somebody know how we can
            get the PGs 

            to repair?

            
            Regards,

            
            Kevin

          
          Kevin,
          

          I just fixed an inconsistent PG yesterday. You will need
            to figure out why they are inconsistent. Do these steps and
            then we can figure out how to proceed.
          1. Do a deep-scrub on each PG that is inconsistent. (This
            may fix some of them)
          2. Print out the inconsistent report for each
            inconsistent PG. `rados list-inconsistent-obj <PG_NUM>
            --format=json-pretty`
          3. You will want to look at the error messages and see if
            all the shards have the same data.
          

          Robert LeBlanc
           
        
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com