Hi all, I need some help getting my 0.87.1 cluster back into a healthy
state...
Overnight, a deep scrub detected an inconsistent object pg. Ceph health
detail said the following:
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 2.3b is active+clean+inconsistent, acting [1,2,0]
2 scrub errors
And these were the corresponding errors from the log:
2015-05-03 02:47:27.804774 6a8bc3f1e700 -1 log_channel(default) log
[ERR] : 2.3b shard 1: soid
c886da7b/rbd_data.25212ae8944a.0000000000000100/head//2 digest
1859582522 != known digest 2859280481, size 4194304 != known size 1642496
2015-05-03 02:47:44.099475 6a8bc3f1e700 -1 log_channel(default) log
[ERR] : 2.3b deep-scrub stat mismatch, got 655/656 objects, 0/0 clones,
655/656 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,
2685746176/2689940480 bytes,0/0 hit_set_archive bytes.
2015-05-03 02:47:44.099496 6a8bc3f1e700 -1 log_channel(default) log
[ERR] : 2.3b deep-scrub 0 missing, 1 inconsistent objects
2015-05-03 02:47:44.099501 6a8bc3f1e700 -1 log_channel(default) log
[ERR] : 2.3b deep-scrub 2 errors
I located the inconsistent object on-disk on the 3 OSDs (and have saved
a copy of them). The copy on OSDs 0 and 2 match each other, and have the
supposedly "known size" of 1642496. The copy on OSD 1 (the primary) has
additional data appended, and a size of 4194304. The content within the
portion of the file that exists on OSDs 0 and 2 is the same on OSD 1, it
just has extra data as well.
As this is part of an RBD (used by a linux VM, with a filesystem on top)
I reasoned that if the "extra data" on OSD 1's copy of the object is not
supposed to be there, then it almost certainly maps to an unallocated
part of the filesystem within the VM, and so having the extra data isn't
going to do any harm. So I want to stick with the version on OSD 1 (the
primary).
I then ran "ceph pg repair 2.3b", as my understanding is that should
replace the copies of the object on OSDs 0 and 2 with the one from the
primary OSD, achieving what I want, and removing the inconsistency.
However that doesn't seem to have happened!
Instead I now have 1 unfound object (and it is the same object that had
previously been reported as inconsistent), and some IO is now being blocked:
# ceph health detail
HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; 1 requests are
blocked > 32 sec; 1 osds have slow requests; recovery -1/1747956 objects
degraded (-0.000%); 1/582652 unfound (0.000%)
pg 2.3b is stuck unclean for 533.238307, current state
active+recovering, last acting [1,2,0]
pg 2.3b is active+recovering, acting [1,2,0], 1 unfound
1 ops are blocked > 524.288 sec
1 ops are blocked > 524.288 sec on osd.1
1 osds have slow requests
recovery -1/1747956 objects degraded (-0.000%); 1/582652 unfound (0.000%)
# ceph pg 2.3b list_missing
{ "offset": { "oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -1,
"namespace": ""},
"num_missing": 1,
"num_unfound": 1,
"objects": [
{ "oid": { "oid": "rbd_data.25212ae8944a.0000000000000100",
"key": "",
"snapid": -2,
"hash": 3364280955,
"max": 0,
"pool": 2,
"namespace": ""},
"need": "1216'8088646",
"have": "0'0",
"locations": []}],
"more": 0}
However the 3 OSDs do still have the corresponding file on-disk, with
the same content that they had when I first looked at them. I can only
assume that because the data in the object on the primary OSD didn't
match the "known size", when I issued the "repair" Ceph somehow decided
to invalidate the copy of the object on the primary OSD, rather than use
it as the authoritative version, and now believes it has no good copies
of the object.
How can I persuade Ceph to just go ahead and use the version of
rbd_data.25212ae8944a.0000000000000100 that is already on-disk on OSD 1,
and push it out to OSDs 0 and 2? Surely there is a way to do that!
Thanks in advance!
Alex
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com