A quick update on my issue. I have noticed that while I was trying to move the problem object on osds, the file attributes got lost on one of the osds, which is I guess why the error messages showed the no attribute bit.
I then copied the attributes metadata to the problematic object and restarted the osds in question. Following a pg repair I got a different error:
2018-06-19 13:51:05.846033 osd.21 osd.21 192.168.168.203:6828/24339 2 : cluster [ERR] 18.2 shard 21: soid 18:45f87722:::.dir.default.80018061.2:head omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910 dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871 alloc_hint [0 0 0])
2018-06-19 13:51:05.846042 osd.21 osd.21 192.168.168.203:6828/24339 3 : cluster [ERR] 18.2 shard 28: soid 18:45f87722:::.dir.default.80018061.2:head omap_digest 0x25e8a1da != omap_digest 0x21c7f871 from auth oi 18:45f87722:::.dir.default.80018061.2:head(106137'603495 osd.21.0:41403910 dirty|omap|data_digest|omap_digest s 0 uv 603494 dd ffffffff od 21c7f871 alloc_hint [0 0 0])
2018-06-19 13:51:05.846046 osd.21 osd.21 192.168.168.203:6828/24339 4 : cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed to pick suitable auth object
2018-06-19 13:51:05.846118 osd.21 osd.21 192.168.168.203:6828/24339 5 : cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_' attr
2018-06-19 13:51:05.846129 osd.21 osd.21 192.168.168.203:6828/24339 6 : cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no 'snapset' attr
2018-06-19 13:51:09.810878 osd.21 osd.21 192.168.168.203:6828/24339 7 : cluster [ERR] 18.2 repair 4 errors, 0 fixed
It mentions that there is an incorrect omap_digest . How do I go about fixing this?
Cheers
From: "andrei" <andrei@xxxxxxxxxx>
To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Tuesday, 19 June, 2018 11:16:22
Subject: fixing unrepairable inconsistent PG
Hello everyoneI am having trouble repairing one inconsistent and stubborn PG. I get the following error in ceph.log:2018-06-19 11:00:00.000225 mon.arh-ibstorage1-ib mon.0 192.168.168.201:6789/0 675 : cluster [ERR] overall HEALTH_ERR noout flag(s) set; 4 scrub errors; Possible data damage: 1 pg inconsistent; application not enabled on 4 pool(s)2018-06-19 11:09:24.586392 mon.arh-ibstorage1-ib mon.0 192.168.168.201:6789/0 841 : cluster [ERR] Health check update: Possible data damage: 1 pg inconsistent, 1 pg repair (PG_DAMAGED)2018-06-19 11:09:27.139504 osd.21 osd.21 192.168.168.203:6828/4003 2 : cluster [ERR] 18.2 soid 18:45f87722:::.dir.default.80018061.2:head: failed to pick suitable object info2018-06-19 11:09:27.139545 osd.21 osd.21 192.168.168.203:6828/4003 3 : cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no '_' attr2018-06-19 11:09:27.139550 osd.21 osd.21 192.168.168.203:6828/4003 4 : cluster [ERR] repair 18.2 18:45f87722:::.dir.default.80018061.2:head no 'snapset' attr2018-06-19 11:09:35.484402 osd.21 osd.21 192.168.168.203:6828/4003 5 : cluster [ERR] 18.2 repair 4 errors, 0 fixed2018-06-19 11:09:40.601657 mon.arh-ibstorage1-ib mon.0 192.168.168.201:6789/0 844 : cluster [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED)I have tried to follow a few instructions on the PG repair, including removal of the 'broken' object .dir.default.80018061.2from primary osd following by the pg repair. After that didn't work, I've done the same for the secondary osd. Still the same issue.Looking at the actual object on the file system, the file size is 0 for both primary and secondary objects. The md5sum is the same too. The broken PG belongs to the radosgw bucket called .rgw.buckets.indexWhat else can I try to get the thing fixed?Cheers
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com