Re: CephFS metdata inconsistent PG Repair Problem

Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> · Mon, 19 Dec 2016 22:36:54 +0000

Hi Sean
In our case, the last time we had this error, we stopped the osd, mark it out, let ceph recover and then reinstall it. We did it because we were suspecting of issues with the osd and that was why we decided to take this approach. The fact is that the pg we were seeing constantly declared as inconsistent does not has problems since a couple of months.
Cheers 
Gonçalo
________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Wido den Hollander [wido@xxxxxxxx]
Sent: 20 December 2016 08:29
To: ceph-users; Sean Redmond
Subject: Re:  CephFS metdata inconsistent PG Repair Problem

> Op 19 december 2016 om 18:14 schreef Sean Redmond <sean.redmond1@xxxxxxxxx>:
>
>
> Hi Ceph-Users,
>
> I have been running into a few issue with cephFS metadata pool corruption
> over the last few weeks, For background please see
> tracker.ceph.com/issues/17177
>
> # ceph -v
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> I am currently facing a side effect of this issue that is making repairing
> an inconsistent PG in the metadata pool (pool 5) difficult and I could use
> some pointers
>
> The PG I am having the issue with is 5.c0:
>
> 0# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;
> noout,sortbitwise,require_jewel_osds flag(s) set
> pg 5.c0 is active+clean+inconsistent, acting [38,10,29]
> 1 scrub errors
> noout,sortbitwise,require_jewel_osds flag(s) set
> #
>
> ceph pg 5.c0 query = http://pastebin.com/9yqrArTg
>
> rados list-inconsistent-obj 5.c0 | python -m json.tool =
> http://pastebin.com/iZV1TfxE
>
> I have looked at the error log and it reports:
>
> 2016-12-19 16:43:36.944457 osd.38 172.27.175.12:6800/194902 10 : cluster
> [ERR] 5.c0 shard 38: soid 5:035881fa:::10002639cb6.00000000:head
> omap_digest 0xc54c7
> 938 != best guess omap_digest 0xb6531260 from auth shard 10
>
> If I attempted to repair this using 'ceph pg repair 5.c0' the cluster
> health returns to OK, but if I force a deep scrub using 'ceph pg deep-scrub
> 5.c0' the same error is reported with exactly the same omap_digest values.
>
> To understand the differences between the three osd's I performed the below
> steps on each of the osd's 38,10,29
>
> -Stop the osd
> -ceph-objectstore-tool --op list --pgid 5.c0 --data-path
> /var/lib/ceph/osd/ceph-$OSDID | grep 10002639cb6 (The output is used in the
> next command)
> - ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$OSDID
> '["5.c0",{"oid":"10002639cb6.00000000","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> list-omap
>
> Taking the output of the above I ran a diff and found that osd.38 has the
> below difference:
>
> # diff osd10-5.c0.txt osd38-5.c0.txt
> 4405a4406
> > B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> #
>
> I assumed the above is a file name, using a find on the file system I
> confirmed the file did not exist So I must assume it was deleted and that
> is expected, so I am happy to try and correct this difference.
>
> As the 'ceph pg repair 5.c0' was not working next I tried following
> http://ceph.com/planet/ceph-manually-repair-object/ to remove the object
> from the file system. Upon doing a deep-scrub before a repair it reports
> the object as missing, after running the repair the object is copied back
> into the osd.38, a further deep-scrub however returns exactly the
> same omap_digest values with osd.38 having a difference (
> http://pastebin.com/iZV1TfxE)
>
> I assume it is because this omap data is stored inside levelDB and not just
> as extended attributes
>
> getfattr -d
> /var/lib/ceph/osd/ceph-38/current/5.60_head/DIR_0/DIR_6/DIR_C/DIR_A/100008ad724.00000000__head_CD74AC60__5
> = http://pastebin.com/4Mc2mNNj
>
> I tried to dig further into this by looking at the value of the opmap key
> using
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-38
> '["5.c0",{"oid":"10002639cb6.00000000","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> output = http://pastebin.com/vVUmw9Qi
>
> I also tried this on osd.29 and found it strange the value existed using
> the below, but the key ' B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head' is
> not listed in the output of omap-list
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29
> '["5.c0",{"oid":"10002639cb6.00000000","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
>
> I maybe walking down the wrong track, but if anyone has any pointers that
> could help with repairing this PG or anything else I should be looking at
> to investigate further that would be very helpful.
>

Thinking out loud, what about using ceph-objectstore-tool to export the PG from a healthy OSD (you have to start it for a moment) and importing it with the same tool into osd.38?

1. Stop osd.38
2. Stop osd.10
3. Export on osd.10
4. Import on osd.38
5. Start osd.10
6. Wait 5 min for PG peering and recovery
7. Start osd.38

Haven't tried this on a system, but something that popped up in my mind.

Wido

> Thanks
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com