Re: Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

Dan Van Der Ster <daniel.vanderster@xxxxxxx> · Wed, 7 Jan 2015 20:12:29 +0000

Hi Nico,

Yes Ceph is production ready. Yes people are using it in production for qemu. Last time I heard, Ceph was surveyed as the most popular backend for OpenStack Cinder in production.
When using RBD in production, it really is critically important to (a) use 3 replicas and (b) pay attention to pg distribution early on so that you don't end up with unbalanced OSDs.
Replication is especially important for RBD because you _must_not_ever_lose_an_entire_pg_. Parts of every single rbd device are stored on every single PG... So losing a PG means you lost random parts of every single block device. If this happens,
 the only safe course of action is to restore from backups. But the whole point of Ceph is that it enables you to configure adequate replication across failure domains, which makes this scenario very very very unlikely to occur.
I don't know why you were getting kernel panics. It's probably advisable to stick to the most recent mainline kernel when using kRBD.
Cheers, Dan
On 7 Jan 2015 20:45, Nico Schottelius <nico-eph-users@xxxxxxxxxxxxxxx> wrote:

Good evening,

we also tried to rescue data *from* our old / broken pool by map'ing the

rbd devices, mounting them on a host and rsync'ing away as much as

possible.

However, after some time rsync got completly stuck and eventually the

host which mounted the rbd mapped devices decided to kernel panic at

which time we decided to drop the pool and go with a backup.

This story and the one of Christian makes me wonder:

    Is anyone using ceph as a backend for qemu VM images in production?

And:

    Has anyone on the list been able to recover from a pg incomplete /

    stuck situation like ours?

Reading about the issues on the list here gives me the impression that

ceph as a software is stuck/incomplete and has not yet become ready

"clean" for production (sorry for the word joke).

Cheers,

Nico

Christian Eichelmann [Tue, Dec 30, 2014 at 12:17:23PM +0100]:

> Hi Nico and all others who answered,

> 

> After some more trying to somehow get the pgs in a working state (I've

> tried force_create_pg, which was putting then in creating state. But

> that was obviously not true, since after rebooting one of the containing

> osd's it went back to incomplete), I decided to save what can be saved.

> 

> I've created a new pool, created a new image there, mapped the old image

> from the old pool and the new image from the new pool to a machine, to

> copy data on posix level.

> 

> Unfortunately, formatting the image from the new pool hangs after some

> time. So it seems that the new pool is suffering from the same problem

> as the old pool. Which is totaly not understandable for me.

> 

> Right now, it seems like Ceph is giving me no options to either save

> some of the still intact rbd volumes, or to create a new pool along the

> old one to at least enable our clients to send data to ceph again.

> 

> To tell the truth, I guess that will result in the end of our ceph

> project (running for already 9 Monthes).

> 

> Regards,

> Christian

> 

> Am 29.12.2014 15:59, schrieb Nico Schottelius:

> > Hey Christian,

> > 

> > Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:

> >> [incomplete PG / RBD hanging, osd lost also not helping]

> > 

> > that is very interesting to hear, because we had a similar situation

> > with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg

> > directories to allow OSDs to start after the disk filled up completly.

> > 

> > So I am sorry not to being able to give you a good hint, but I am very

> > interested in seeing your problem solved, as it is a show stopper for

> > us, too. (*)

> > 

> > Cheers,

> > 

> > Nico

> > 

> > (*) We migrated from sheepdog to gluster to ceph and so far sheepdog

> >     seems to run much smoother. The first one is however not supported

> >     by opennebula directly, the second one not flexible enough to host

> >     our heterogeneous infrastructure (mixed disk sizes/amounts) - so we 

> >     are using ceph at the moment.

> > 

> 

> 

> -- 

> Christian Eichelmann

> Systemadministrator

> 

> 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting

> Brauerstraße 48 · DE-76135 Karlsruhe

> Telefon: +49 721 91374-8026

> christian.eichelmann@xxxxxxxx

> 

> Amtsgericht Montabaur / HRB 6484

> Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert

> Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen

> Aufsichtsratsvorsitzender: Michael Scheeren

-- 

New PGP key: 659B 0D91 E86E 7E24 FD15  69D0 C729 21A1 293F 2D24

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com