Re: Ceph PG Incomplete = Cluster unusable

Andrey Korolyov <andrey@xxxxxxx> · Tue, 30 Dec 2014 02:22:04 +0400

On Mon, Dec 29, 2014 at 12:56 PM, Christian Eichelmann
<christian.eichelmann@xxxxxxxx> wrote:
> Hi all,
>
> we have a ceph cluster, with currently 360 OSDs in 11 Systems. Last week
> we were replacing one OSD System with a new one. During that, we had a
> lot of problems with OSDs crashing on all of our systems. But that is
> not our current problem.
>
> After we got everything up and running again, we still have 3 PGs in the
> state incomplete. I was checking one of them directly on the systems
> (replication factor is 3). On two machines the directory was there but
> empty, on the third one, I found some content. Using
> ceph_objectstore_tool I exported this PG and imported it on the other
> nodes. Nothing changed.
>
> We only use ceph for providing rbd images. Right now, two of them are
> unusable, because ceph hangs when someone trys to access content in
> these pgs. Not bad enough, if I create a new rbd image, ceph is still
> using the incomplete pgs, so it is a pure gambling if a new volume will
> be usable or not. That, for now, makes our 900TB ceph cluster unusable
> because of 3 bad PGs.
>
> And right here it seems like I can't to anything. Instructing the ceph
> cluster to scrub, deep-scrub or repair the pg does nothing, even after
> several days. Checking which rbd images are affected is also not
> possible, because rados -p poolname ls hangs forever when it comes to
> one of the incomplete pgs. ceph osd lost also does actually nothing.
>
> So right now, I am OK if I lose the content of these three PGs. So how
> can I get the cluster back to live without deleting the whole pool which
> is not for discussion?
>

Christian, would you mind to provide an exact backtrace for those
crashes from core file? This one is clearly represents one of my worst
nightmares, domino crash of a healthy cluster and even for unstable
version such as Giant issue should be at least properly pinned. I also
suspect that you have an almost empty cluster or very low number of
volumes, as only two volumes are affected in your case. If you don`t
care about your data, after obtaining core dump you may want to try to
mark those pgs as lost, as operational guide suggests.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com