Robin,
Would you generate the values and keys for the various versions
of at least one of the objects? .dir.default.292886573.13181.12
is a good example because there are 3 variations for the same
object.
If there isn't much activity to .dir.default.64449186.344176, you
could do one osd at a time. Otherwise, stop all 3 OSDs 1322, 990,
655 execute these for all 3. I suspect you'll need to pipe to
"od-cx" to get printable output.
I created a simple object with ascii omap.
$ ceph-objectstore-tool --data-path ... --pgid 5.3d40
.dir.default.64449186.344176 get-omaphdr
obj_header
$ for i in $(ceph-objectstore-tool --data-path ... --pgid 5.3d40
.dir.default.64449186.344176 list-omap)
do
echo -n "${i}: "
ceph-objectstore-tool --data-path ...
.dir.default.292886573.13181.12 get-omap $i
done
key1: val1
key2: val2
key3: val3
David
On 9/8/17 12:18 PM, David Zafman wrote:
Robin,
The only two changesets I can spot in
Jewel that I think might be related are these:
1.
http://tracker.ceph.com/issues/20089
https://github.com/ceph/ceph/pull/15416
This should improve the repair functionality.
2.
http://tracker.ceph.com/issues/19404
https://github.com/ceph/ceph/pull/14204
This pull request fixes an issue that corrupted omaps. It also
finds and repairs them. However, the repair process might
resurrect deleted omaps which would show up as an omap digest
error.
This could temporarily cause additional inconsistent PGs. So if
this has NOT been occurring longer than your deep-scrub interval
since upgrading, I'd repair the pgs and monitor going forward to
make sure the problem doesn't recur.
---------------
You have good example of repair scenarios:
.dir.default.292886573.13181.12 only has a omap_digest_mismatch
and no shard errors. The automatic repair won't be sure which is
a good copy.
In this case we can see that osd 1327 doesn't match the other
two. To assist the repair process to repair the right one. Remove
the copy on osd.1327
Stop osd 1327 and use "ceph-objectstore-tool --data-path .....1327
.dir.default.292886573.13181.12 remove"
.dir.default.64449186.344176 has selected_object_info with "od
337cf025" so shards have "omap_digest_mismatch_oi" except for osd
990.
The pg repair code will use osd.990 to fix the other 2 copies
without further handling.
David
On 9/8/17 11:16 AM, Robin H. Johnson wrote:
On Thu, Sep 07, 2017 at 08:24:04PM +0000,
Robin H. Johnson wrote:
pg 5.3d40 is active+clean+inconsistent,
acting [1322,990,655]
pg 5.f1c0 is active+clean+inconsistent, acting [631,1327,91]
Here is the output of 'rados list-inconsistent-obj' for the PGs:
$ sudo rados list-inconsistent-obj 5.f1c0 |json_pp -json_opt
canonical,pretty
{
"epoch" : 1221254,
"inconsistents" : [
{
"errors" : [
"omap_digest_mismatch"
],
"object" : {
"locator" : "",
"name" : ".dir.default.292886573.13181.12",
"nspace" : "",
"snap" : "head",
"version" : 483490
},
"selected_object_info" :
"5:038f1cff:::.dir.default.292886573.13181.12:head(1221843'483490
client.417313345.0:19515832 dirty|omap|data_digest s 0 uv 483490
dd ffffffff alloc_hint [0 0])",
"shards" : [
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x928b0c0b",
"osd" : 91,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x928b0c0b",
"osd" : 631,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x6556c868",
"osd" : 1327,
"size" : 0
}
],
"union_shard_errors" : []
}
]
}
$ sudo rados list-inconsistent-obj 5.3d40 |json_pp -json_opt
canonical,pretty
{
"epoch" : 1210895,
"inconsistents" : [
{
"errors" : [
"omap_digest_mismatch"
],
"object" : {
"locator" : "",
"name" : ".dir.default.64449186.344176",
"nspace" : "",
"snap" : "head",
"version" : 1177199
},
"selected_object_info" :
"5:02bc4def:::.dir.default.64449186.344176:head(1177700'1180639
osd.1322.0:537914 dirty|omap|data_digest|omap_digest s 0 uv
1177199 dd ffffffff od 337cf025 alloc_hint [0 0])",
"shards" : [
{
"data_digest" : "0xffffffff",
"errors" : [
"omap_digest_mismatch_oi"
],
"omap_digest" : "0x3242b04e",
"osd" : 655,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [],
"omap_digest" : "0x337cf025",
"osd" : 990,
"size" : 0
},
{
"data_digest" : "0xffffffff",
"errors" : [
"omap_digest_mismatch_oi"
],
"omap_digest" : "0xc90d06a8",
"osd" : 1322,
"size" : 0
}
],
"union_shard_errors" : [
"omap_digest_mismatch_oi"
]
}
]
}
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|