On 21. sep. 2017 00:35, hjcho616 wrote:
# rados list-inconsistent-pg data
["0.0","0.5","0.a","0.e","0.1c","0.29","0.2c"]
# rados list-inconsistent-pg metadata
["1.d","1.3d"]
# rados list-inconsistent-pg rbd
["2.7"]
# rados list-inconsistent-obj 0.0 --format=json-pretty
{
"epoch": 23112,
"inconsistents": []
}
# rados list-inconsistent-obj 0.5 --format=json-pretty
{
"epoch": 23078,
"inconsistents": []
}
# rados list-inconsistent-obj 0.a --format=json-pretty
{
"epoch": 22954,
"inconsistents": []
}
# rados list-inconsistent-obj 0.e --format=json-pretty
{
"epoch": 23068,
"inconsistents": []
}
# rados list-inconsistent-obj 0.1c --format=json-pretty
{
"epoch": 22954,
"inconsistents": []
}
# rados list-inconsistent-obj 0.29 --format=json-pretty
{
"epoch": 22974,
"inconsistents": []
}
# rados list-inconsistent-obj 0.2c --format=json-pretty
{
"epoch": 23194,
"inconsistents": []
}
# rados list-inconsistent-obj 1.d --format=json-pretty
{
"epoch": 23072,
"inconsistents": []
}
# rados list-inconsistent-obj 1.3d --format=json-pretty
{
"epoch": 23221,
"inconsistents": []
}
# rados list-inconsistent-obj 2.7 --format=json-pretty
{
"epoch": 23032,
"inconsistents": []
}
Looks like not much information is there. Could you elaborate on the
items you mentioned in find the object? How do I check metadata. What
are we looking for in md5sum?
- find the object :: manually check the objects, check the object
metadata, run md5sum on them all and compare. check objects on the
nonrunning osd's and compare there as well. anything to try to determine
what object is ok and what is bad.
I tried that Ceph: manually repair object - Ceph
<http://ceph.com/geen-categorie/ceph-manually-repair-object/> methods on
PG 2.7 before..Tried 3 replica case, which would result in shard
missing, regardless of which one I moved, 2 replica case, hmm... I
guess I don't know how long is "wait a bit" is, I just turned it back on
after a minute or so, just returns back to same inconsistent message..
=P Are we looking for entire stopped OSD to map to different OSD and
get 3 replica when running stopped OSD again?
Regards,
Hong
since your list-inconsistent-obj is empty, you need to up debugging on
all osd's and grep the logs to find the objects with issues. this is
explained in the link. ceph ph map [pg] tells you what osd's to look
at, and the log will have hints to the reason for the error. keep in
mind that it can be a while since the scrub errors out, so you may need
to look at older logs. or trigger a scrub, and wait for it to finish so
you can check the current log.
once you have the object names you can find them with the find command.
after removing/fixing the broken object, and restaring osd, you issue
the repair, and wait for the repair and scrub of that pg to finish. you
can probably follow along by tailing the log.
good luck
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com