Re: Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 9 Jan 2015 11:21:46 -0800

On Fri, Jan 9, 2015 at 2:00 AM, Nico Schottelius
<nico-ceph-users@xxxxxxxxxxxxxxx> wrote:
> Lionel, Christian,
>
> we do have the exactly same trouble as Christian,
> namely
>
> Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]:
>> We still don't know what caused this specific error...
>
> and
>
>> ...there is currently no way to make ceph forget about the data of this pg and create it as an empty one. So the only way
>> to make this pool usable again is to loose all your data in there.
>
> I wonder what is the position of ceph developers regarding
> dropping (emptying) specific pgs?
> Is that a use case that was never thought of or tested?

I've never worked directly on any of the cluster this has happened to,
but I believe every time we've seen issues like this with somebody we
have a relationship with it's either:
1) been resolved by using the existing tools to stuff lost, or
2) been the result of local filesystems/disks silently losing data due
to some fault or other.

The second case means the OSDs have corrupted state and trusting them
is tricky. Also, most people we've had relationships with that this
has happened to really want to not lose all the data in the PG, which
necessitates manually mucking around anyway. ;)

Mailing list issues are obviously a lot harder to categorize, but the
ones we've taken time on where people say the commands don't work have
generally fallen into the second bucket.

If you want to experiment, I think all the manual mucking around has
been done with the objectstore tool and removing bad PGs, moving them
around, or faking journal entries, but I've not done it myself so I
could be mistaken.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com