if you were following this page:
http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-pg/
then there is normally hours of troubleshooting in the following
paragraph, before finally admitting defeat and marking the object as lost:
"It is possible that there are other locations where the object can
exist that are not listed. For example, if a ceph-osd is stopped and
taken out of the cluster, the cluster fully recovers, and due to some
future set of failures ends up with an unfound object, it won’t consider
the long-departed ceph-osd as a potential location to consider. (This
scenario, however, is unlikely.)"
Also this warning is important regarding the loosing of objects:
"Use this with caution, as it may confuse applications that expected the
object to exist."
mds is definitiftly such an application. i think rgw would be the only
application that loosing a object could be acceptable, depending on what
used the object storage. rbd and cephfs will have issues of varying
degree. One could argue that the mark-unfound-lost command should have a
--yes-i-mean-it type of warning, especialy of the pool application is
cephfs or rbd
This is ofcourse a bit late now that the object is marked as lost. but
for your future reference: since you had a inconsistent pg, most likely
you had one corrupt object and 1 or more OK object on some osd. and
using the methods written about in
http://ceph.com/geen-categorie/ceph-manually-repair-object/ might have
recovered that object for you.
kind regards
Ronny Aasen
On 26. okt. 2017 04:38, danield@xxxxxxxxxxxxxxxx wrote:
Hi Ronny,
From the documentation, I thought this was the proper way to resolve the
issue.
Dan
On 24. okt. 2017 19:14, Daniel Davidson wrote:
Our ceph system is having a problem.
A few days a go we had a pg that was marked as inconsistent, and today I
fixed it with a:
#ceph pg repair 1.37c
then a file was stuck as missing so I did a:
#ceph pg 1.37c mark_unfound_lost delete
pg has 1 objects unfound and apparently lost marking
sorry i can not assist on the corrupt mds part. i have no experience in
that part.
But I felt this escaleted a bit quick. since this is a "i accept lost
object" type of command, the consequences are quite ugly, depending on
what the missing object was for. Did you do much troubleshooting before
jumping to this command so you were certain there was no other non
dataloss options ?
kind regards
Ronny Aasen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com