Re: Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

Lionel Bouton <lionel-subscription@xxxxxxxxxxx> · Wed, 07 Jan 2015 21:10:24 +0100

On 12/30/14 16:36, Nico Schottelius wrote:
> Good evening,
>
> we also tried to rescue data *from* our old / broken pool by map'ing the
> rbd devices, mounting them on a host and rsync'ing away as much as
> possible.
>
> However, after some time rsync got completly stuck and eventually the
> host which mounted the rbd mapped devices decided to kernel panic at
> which time we decided to drop the pool and go with a backup.
>
> This story and the one of Christian makes me wonder:
>
>     Is anyone using ceph as a backend for qemu VM images in production?

Yes with Ceph 0.80.5 since September after extensive testing over
several months (including an earlier version IIRC) and some hardware
failure simulations. We plan to upgrade one storage host and one monitor
to 0.80.7 to validate this version over several months too before
migrating the others.

>
> And:
>
>     Has anyone on the list been able to recover from a pg incomplete /
>     stuck situation like ours?

Only by adding back an OSD with the data needed to reach min_size for
said pg, which is expected behavior. Even with some experimentations
with isolated unstable OSDs I've not yet witnessed a case where Ceph
lost multiple replicates simultaneously (we lost one OSD to disk failure
and another to a BTRFS bug but without trying to recover the filesystem
so we might have been able to recover this OSD).

If your setup is susceptible to situations where you can lose all
replicates you will lose data but there's not much that can be done
about that. Ceph actually begins to generate new replicates to replace
the missing onesafter"mon osd down out interval" so the actual loss
should not happen unless you lose (and can't recover) <size> OSDs on
separate hosts (with default crush map) simultaneously. Before going in
production you should know how long Ceph will take to fully recover from
a disk or host failure by testing it with load. Your setup might not be
robust if it hasn't the available disk space or the speed needed to
recover quickly from such a failure.

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com