Re: Power outages!!! help!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21. sep. 2017 00:35, hjcho616 wrote:
# rados list-inconsistent-pg data
["0.0","0.5","0.a","0.e","0.1c","0.29","0.2c"]
# rados list-inconsistent-pg metadata
["1.d","1.3d"]
# rados list-inconsistent-pg rbd
["2.7"]
# rados list-inconsistent-obj 0.0 --format=json-pretty
{
     "epoch": 23112,
     "inconsistents": []
}
# rados list-inconsistent-obj 0.5 --format=json-pretty
{
     "epoch": 23078,
     "inconsistents": []
}
# rados list-inconsistent-obj 0.a --format=json-pretty
{
     "epoch": 22954,
     "inconsistents": []
}
# rados list-inconsistent-obj 0.e --format=json-pretty
{
     "epoch": 23068,
     "inconsistents": []
}
# rados list-inconsistent-obj 0.1c --format=json-pretty
{
     "epoch": 22954,
     "inconsistents": []
}
# rados list-inconsistent-obj 0.29 --format=json-pretty
{
     "epoch": 22974,
     "inconsistents": []
}
# rados list-inconsistent-obj 0.2c --format=json-pretty
{
     "epoch": 23194,
     "inconsistents": []
}
# rados list-inconsistent-obj 1.d --format=json-pretty
{
     "epoch": 23072,
     "inconsistents": []
}
# rados list-inconsistent-obj 1.3d --format=json-pretty
{
     "epoch": 23221,
     "inconsistents": []
}
# rados list-inconsistent-obj 2.7 --format=json-pretty
{
     "epoch": 23032,
     "inconsistents": []
}

Looks like not much information is there. Could you elaborate on the items you mentioned in find the object? How do I check metadata. What are we looking for in md5sum?

- find the object :: manually check the objects, check the object metadata, run md5sum on them all and compare. check objects on the nonrunning osd's and compare there as well. anything to try to determine what object is ok and what is bad.

I tried that Ceph: manually repair object - Ceph <http://ceph.com/geen-categorie/ceph-manually-repair-object/> methods on PG 2.7 before..Tried 3 replica case, which would result in shard missing, regardless of which one I moved, 2 replica case, hmm... I guess I don't know how long is "wait a bit" is, I just turned it back on after a minute or so, just returns back to same inconsistent message.. =P Are we looking for entire stopped OSD to map to different OSD and get 3 replica when running stopped OSD again?

Regards,
Hong


since your list-inconsistent-obj is empty, you need to up debugging on all osd's and grep the logs to find the objects with issues. this is explained in the link. ceph ph map [pg] tells you what osd's to look at, and the log will have hints to the reason for the error. keep in mind that it can be a while since the scrub errors out, so you may need to look at older logs. or trigger a scrub, and wait for it to finish so you can check the current log.

once you have the object names you can find them with the find command.

after removing/fixing the broken object, and restaring osd, you issue the repair, and wait for the repair and scrub of that pg to finish. you can probably follow along by tailing the log.

good luck
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux