I have been following this thread for a while, and thought I need to have "major ceph disaster" alert on the monitoring ;) http://www.f1-outsourcing.eu/files/ceph-disaster.mp4 -----Original Message----- From: Kevin Flöh [mailto:kevin.floeh@xxxxxxx] Sent: donderdag 23 mei 2019 10:51 To: Robert LeBlanc Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: Major ceph disaster Hi, we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery wait to start the deep-scrub and get the output? I guess the recovery_wait might be caused by missing objects. Do we need to delete them first to get the recovery going? Kevin On 22.05.19 6:03 nachm., Robert LeBlanc wrote: On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.floeh@xxxxxxx> wrote: Hi, thank you, it worked. The PGs are not incomplete anymore. Still we have another problem, there are 7 PGs inconsistent and a cpeh pg repair is not doing anything. I just get "instructing pg 1.5dd on osd.24 to repair" and nothing happens. Does somebody know how we can get the PGs to repair? Regards, Kevin Kevin, I just fixed an inconsistent PG yesterday. You will need to figure out why they are inconsistent. Do these steps and then we can figure out how to proceed. 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them) 2. Print out the inconsistent report for each inconsistent PG. `rados list-inconsistent-obj <PG_NUM> --format=json-pretty` 3. You will want to look at the error messages and see if all the shards have the same data. Robert LeBlanc _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com