Re: Major ceph disaster

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 23 May 2019 10:54:50 +0200

What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.

On Thu, May 23, 2019 at 10:53 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
>
> Hi,
>
> we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery wait to start the deep-scrub and get the output? I guess the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going?
>
> Kevin
>
> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
>
> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote:
>>
>> Hi,
>>
>> thank you, it worked. The PGs are not incomplete anymore. Still we have
>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
>> repair" and nothing happens. Does somebody know how we can get the PGs
>> to repair?
>>
>> Regards,
>>
>> Kevin
>
>
> Kevin,
>
> I just fixed an inconsistent PG yesterday. You will need to figure out why they are inconsistent. Do these steps and then we can figure out how to proceed.
> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)
> 2. Print out the inconsistent report for each inconsistent PG. `rados list-inconsistent-obj <PG_NUM> --format=json-pretty`
> 3. You will want to look at the error messages and see if all the shards have the same data.
>
> Robert LeBlanc
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com