Re: Major ceph disaster

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Thu, 23 May 2019 10:58:55 +0200

I have been following this thread for a while, and thought I need to 
have 
 "major ceph disaster" alert on the monitoring ;)
 http://www.f1-outsourcing.eu/files/ceph-disaster.mp4 

-----Original Message-----
From: Kevin Flöh [mailto:kevin.floeh@xxxxxxx] 
Sent: donderdag 23 mei 2019 10:51
To: Robert LeBlanc
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Major ceph disaster

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does 
not change anything. Hence, the rados report is empty. Is there a way to 
stop the recovery wait to start the deep-scrub and get the output? I 
guess the recovery_wait might be caused by missing objects. Do we need 
to delete them first to get the recovery going?

Kevin

On 22.05.19 6:03 nachm., Robert LeBlanc wrote:

	On Wed, May 22, 2019 at 4:31 AM Kevin Flöh <kevin.floeh@xxxxxxx> 
wrote:

		Hi,

		thank you, it worked. The PGs are not incomplete anymore. 
Still we have 
		another problem, there are 7 PGs inconsistent and a cpeh pg 
repair is 
		not doing anything. I just get "instructing pg 1.5dd on osd.24 
to 
		repair" and nothing happens. Does somebody know how we can get 
the PGs 
		to repair?

		Regards,

		Kevin

	Kevin,

	I just fixed an inconsistent PG yesterday. You will need to figure 
out why they are inconsistent. Do these steps and then we can figure out 
how to proceed.
	1. Do a deep-scrub on each PG that is inconsistent. (This may fix 
some of them)
	2. Print out the inconsistent report for each inconsistent PG. 
`rados list-inconsistent-obj <PG_NUM> --format=json-pretty`
	3. You will want to look at the error messages and see if all the 
shards have the same data.

	Robert LeBlanc

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com