Re: MDS damaged

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



if you were following this page:
http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-pg/


then there is normally hours of troubleshooting in the following paragraph, before finally admitting defeat and marking the object as lost:

"It is possible that there are other locations where the object can exist that are not listed. For example, if a ceph-osd is stopped and taken out of the cluster, the cluster fully recovers, and due to some future set of failures ends up with an unfound object, it won’t consider the long-departed ceph-osd as a potential location to consider. (This scenario, however, is unlikely.)"


Also this warning is important regarding the loosing of objects:
"Use this with caution, as it may confuse applications that expected the object to exist."

mds is definitiftly such an application. i think rgw would be the only application that loosing a object could be acceptable, depending on what used the object storage. rbd and cephfs will have issues of varying degree. One could argue that the mark-unfound-lost command should have a --yes-i-mean-it type of warning, especialy of the pool application is cephfs or rbd


This is ofcourse a bit late now that the object is marked as lost. but for your future reference: since you had a inconsistent pg, most likely you had one corrupt object and 1 or more OK object on some osd. and using the methods written about in http://ceph.com/geen-categorie/ceph-manually-repair-object/ might have recovered that object for you.

kind regards
Ronny Aasen



On 26. okt. 2017 04:38, danield@xxxxxxxxxxxxxxxx wrote:
Hi Ronny,

 From the documentation, I thought this was the proper way to resolve the
issue.

Dan

On 24. okt. 2017 19:14, Daniel Davidson wrote:
Our ceph system is having a problem.

A few days a go we had a pg that was marked as inconsistent, and today I
fixed it with a:

#ceph pg repair 1.37c

then a file was stuck as missing so I did a:

#ceph pg 1.37c mark_unfound_lost delete
pg has 1 objects unfound and apparently lost marking

sorry i can not assist on the corrupt mds part. i have no experience in
that part.

But I felt this escaleted a bit quick. since this is a "i accept lost
object" type of command, the consequences are quite ugly, depending on
what the missing object was for.  Did you do much troubleshooting before
jumping to this command so you were certain there was no other non
dataloss options ?

kind regards
Ronny Aasen



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux