Re: CephFS metdata inconsistent PG Repair Problem

Wido den Hollander <wido@xxxxxxxx> · Mon, 19 Dec 2016 22:29:36 +0100 (CET)

> Op 19 december 2016 om 18:14 schreef Sean Redmond <sean.redmond1@xxxxxxxxx>:
> 
> 
> Hi Ceph-Users,
> 
> I have been running into a few issue with cephFS metadata pool corruption
> over the last few weeks, For background please see
> tracker.ceph.com/issues/17177
> 
> # ceph -v
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
> 
> I am currently facing a side effect of this issue that is making repairing
> an inconsistent PG in the metadata pool (pool 5) difficult and I could use
> some pointers
> 
> The PG I am having the issue with is 5.c0:
> 
> 0# ceph health detail
> HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;
> noout,sortbitwise,require_jewel_osds flag(s) set
> pg 5.c0 is active+clean+inconsistent, acting [38,10,29]
> 1 scrub errors
> noout,sortbitwise,require_jewel_osds flag(s) set
> #
> 
> ceph pg 5.c0 query = http://pastebin.com/9yqrArTg
> 
> rados list-inconsistent-obj 5.c0 | python -m json.tool =
> http://pastebin.com/iZV1TfxE
> 
> I have looked at the error log and it reports:
> 
> 2016-12-19 16:43:36.944457 osd.38 172.27.175.12:6800/194902 10 : cluster
> [ERR] 5.c0 shard 38: soid 5:035881fa:::10002639cb6.00000000:head
> omap_digest 0xc54c7
> 938 != best guess omap_digest 0xb6531260 from auth shard 10
> 
> If I attempted to repair this using 'ceph pg repair 5.c0' the cluster
> health returns to OK, but if I force a deep scrub using 'ceph pg deep-scrub
> 5.c0' the same error is reported with exactly the same omap_digest values.
> 
> To understand the differences between the three osd's I performed the below
> steps on each of the osd's 38,10,29
> 
> -Stop the osd
> -ceph-objectstore-tool --op list --pgid 5.c0 --data-path
> /var/lib/ceph/osd/ceph-$OSDID | grep 10002639cb6 (The output is used in the
> next command)
> - ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-$OSDID
> '["5.c0",{"oid":"10002639cb6.00000000","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> list-omap
> 
> Taking the output of the above I ran a diff and found that osd.38 has the
> below difference:
> 
> # diff osd10-5.c0.txt osd38-5.c0.txt
> 4405a4406
> > B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> #
> 
> I assumed the above is a file name, using a find on the file system I
> confirmed the file did not exist So I must assume it was deleted and that
> is expected, so I am happy to try and correct this difference.
> 
> As the 'ceph pg repair 5.c0' was not working next I tried following
> http://ceph.com/planet/ceph-manually-repair-object/ to remove the object
> from the file system. Upon doing a deep-scrub before a repair it reports
> the object as missing, after running the repair the object is copied back
> into the osd.38, a further deep-scrub however returns exactly the
> same omap_digest values with osd.38 having a difference (
> http://pastebin.com/iZV1TfxE)
> 
> I assume it is because this omap data is stored inside levelDB and not just
> as extended attributes
> 
> getfattr -d
> /var/lib/ceph/osd/ceph-38/current/5.60_head/DIR_0/DIR_6/DIR_C/DIR_A/100008ad724.00000000__head_CD74AC60__5
> = http://pastebin.com/4Mc2mNNj
> 
> I tried to dig further into this by looking at the value of the opmap key
> using
> 
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-38
> '["5.c0",{"oid":"10002639cb6.00000000","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> output = http://pastebin.com/vVUmw9Qi
> 
> I also tried this on osd.29 and found it strange the value existed using
> the below, but the key ' B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head' is
> not listed in the output of omap-list
> 
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-29
> '["5.c0",{"oid":"10002639cb6.00000000","key":"","snapid":-2,"hash":1602296512,"max":0,"pool":5,"namespace":"","max":0}]'
> get-omap B6492C5C-A917-A77F-5F301516EC6448F5.jpg_head
> 
> I maybe walking down the wrong track, but if anyone has any pointers that
> could help with repairing this PG or anything else I should be looking at
> to investigate further that would be very helpful.
> 

Thinking out loud, what about using ceph-objectstore-tool to export the PG from a healthy OSD (you have to start it for a moment) and importing it with the same tool into osd.38?

1. Stop osd.38
2. Stop osd.10
3. Export on osd.10
4. Import on osd.38
5. Start osd.10
6. Wait 5 min for PG peering and recovery
7. Start osd.38

Haven't tried this on a system, but something that popped up in my mind.

Wido

> Thanks
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com