Re: pg 17.36 is active+clean+inconsistent head expected clone 1 missing?

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Tue, 3 Sep 2019 11:45:02 +0200

Hi Steve,

I was just about to follow your steps[0] with the ceph-objectstore-tool, 
(I do not want to remove more snapshots)

So I have this error 
pg 17.36 is active+clean+inconsistent, acting [7,29,12]

2019-09-02 14:17:34.175139 7f9b3f061700 -1 log_channel(cluster) log 
[ERR] : deep-scrub 17.36 
17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head : expected 
clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 missing

I removed the snapshot with snapshot id 4, did a pg repair without any 
result.

I am trying to understand this command of yours, 

ceph-objectstore-tool --type bluestore --data-path 
/var/lib/ceph/osd/ceph-229/ --pgid 2.9a6
'{"oid":"rb.0.2479b45.238e1f29","snapid":-2,"hash":2320771494,"max":0,"p
ool":2,"namespace":"","max":0}' 

I think you are getting this info from the --op list not? And grep for 
the "rbd_data.1f114174b0dc51.0000000000000974" occurance? I have these 
entries on osd.29

["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna
pid":63,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}]
["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna
pid":-2,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}]

So I guess snapid's with -2 are bad? I have noticed actually quite a few 
-2 listings in these op list output, and do not understand why there are 
so many and the cluster is healthy except for this pg 17.36.

[0]
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg47212.html

-----Original Message-----
From: Steve Anthony [mailto:sma310@xxxxxxxxxx] 
Sent: vrijdag 16 november 2018 17:44
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  pg 17.36 is active+clean+inconsistent head 
expected clone 1 missing?

Looks similar to a problem I had after a several OSDs crashed while 
trimming snapshots. In my case, the primary OSD thought the snapshot was 
gone, but some of the replicas are still there, so scrubbing flags it.

First I purged all snapshots and then ran ceph pg repair on the 
problematic placement groups. The first time I encountered this, that 
action was sufficient to repair the problem. The second time however, I 
ended up having to manually remove the snapshot objects.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027431.html

Once I had done that, repair the placement group fixed the issue.

-Steve

On 11/16/2018 04:00 AM, Marc Roos wrote:
>  
>
> I am not sure that is going to work, because I have this error quite 
> some time, from before I added the 4th node. And on the 3 node cluster 

> it was:
>  
> osdmap e18970 pg 17.36 (17.36) -> up [9,0,12] acting [9,0,12]
>
> If I understand correctly what you intent to do, moving the data 
around. 
> This was sort of accomplished by adding the 4th node.
>
>
>
> -----Original Message-----
> From: Frank Yu [mailto:flyxiaoyu@xxxxxxxxx]
> Sent: vrijdag 16 november 2018 3:51
> To: Marc Roos
> Cc: ceph-users
> Subject: Re:  pg 17.36 is active+clean+inconsistent head 
> expected clone 1 missing?
>
> try to restart osd.29, then use pg repair. If this doesn't work or it 
> appear again after a while, scan your HDD which used for osd.29, maybe 

> there is bad sector of your disks, just replace the disk with new one.
>
>
>
> On Thu, Nov 15, 2018 at 5:00 PM Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>
> wrote:
>
>
> 	 
> 	Forgot, these are bluestore osds
> 	
> 	
> 	
> 	-----Original Message-----
> 	From: Marc Roos 
> 	Sent: donderdag 15 november 2018 9:59
> 	To: ceph-users
> 	Subject:  pg 17.36 is active+clean+inconsistent head 
> 	expected clone 1 missing?
> 	
> 	
> 	
> 	I thought I will give it another try, asking again here since 
there 
> is
> 	another thread current. I am having this error since a year or 
so.
> 	
> 	This I of course already tried:
> 	ceph pg deep-scrub 17.36
> 	ceph pg repair 17.36
> 	
> 	
> 	[@c01 ~]# rados list-inconsistent-obj 17.36 
> 	{"epoch":24363,"inconsistents":[]}
> 	
> 	
> 	[@c01 ~]# ceph pg map 17.36
> 	osdmap e24380 pg 17.36 (17.36) -> up [29,12,6] acting [29,12,6]
> 	
> 	
> 	[@c04 ceph]# zgrep ERR ceph-osd.29.log*gz
> 	ceph-osd.29.log-20181114.gz:2018-11-13 14:19:55.766604 
7f25a05b1700
> -1
> 	log_channel(cluster) log [ERR] : deep-scrub 17.36 
> 	17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head
> expected 
> 	clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 
> missing
> 	ceph-osd.29.log-20181114.gz:2018-11-13 14:24:55.943454 
7f25a05b1700
> -1
> 	log_channel(cluster) log [ERR] : 17.36 deep-scrub 1 errors
> 	
> 	
> 	_______________________________________________
> 	ceph-users mailing list
> 	ceph-users@xxxxxxxxxxxxxx
> 	http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 	
> 	
> 	_______________________________________________
> 	ceph-users mailing list
> 	ceph-users@xxxxxxxxxxxxxx
> 	http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 	
>
>
>

--
Steve Anthony
LTS HPC Senior Analyst
Lehigh University
sma310@xxxxxxxxxx

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx