Good evening, we also tried to rescue data *from* our old / broken pool by map'ing the rbd devices, mounting them on a host and rsync'ing away as much as possible. However, after some time rsync got completly stuck and eventually the host which mounted the rbd mapped devices decided to kernel panic at which time we decided to drop the pool and go with a backup. This story and the one of Christian makes me wonder: Is anyone using ceph as a backend for qemu VM images in production? And: Has anyone on the list been able to recover from a pg incomplete / stuck situation like ours? Reading about the issues on the list here gives me the impression that ceph as a software is stuck/incomplete and has not yet become ready "clean" for production (sorry for the word joke). Cheers, Nico Christian Eichelmann [Tue, Dec 30, 2014 at 12:17:23PM +0100]: > Hi Nico and all others who answered, > > After some more trying to somehow get the pgs in a working state (I've > tried force_create_pg, which was putting then in creating state. But > that was obviously not true, since after rebooting one of the containing > osd's it went back to incomplete), I decided to save what can be saved. > > I've created a new pool, created a new image there, mapped the old image > from the old pool and the new image from the new pool to a machine, to > copy data on posix level. > > Unfortunately, formatting the image from the new pool hangs after some > time. So it seems that the new pool is suffering from the same problem > as the old pool. Which is totaly not understandable for me. > > Right now, it seems like Ceph is giving me no options to either save > some of the still intact rbd volumes, or to create a new pool along the > old one to at least enable our clients to send data to ceph again. > > To tell the truth, I guess that will result in the end of our ceph > project (running for already 9 Monthes). > > Regards, > Christian > > Am 29.12.2014 15:59, schrieb Nico Schottelius: > > Hey Christian, > > > > Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]: > >> [incomplete PG / RBD hanging, osd lost also not helping] > > > > that is very interesting to hear, because we had a similar situation > > with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg > > directories to allow OSDs to start after the disk filled up completly. > > > > So I am sorry not to being able to give you a good hint, but I am very > > interested in seeing your problem solved, as it is a show stopper for > > us, too. (*) > > > > Cheers, > > > > Nico > > > > (*) We migrated from sheepdog to gluster to ceph and so far sheepdog > > seems to run much smoother. The first one is however not supported > > by opennebula directly, the second one not flexible enough to host > > our heterogeneous infrastructure (mixed disk sizes/amounts) - so we > > are using ceph at the moment. > > > > > -- > Christian Eichelmann > Systemadministrator > > 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting > Brauerstraße 48 · DE-76135 Karlsruhe > Telefon: +49 721 91374-8026 > christian.eichelmann@xxxxxxxx > > Amtsgericht Montabaur / HRB 6484 > Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert > Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen > Aufsichtsratsvorsitzender: Michael Scheeren -- New PGP key: 659B 0D91 E86E 7E24 FD15 69D0 C729 21A1 293F 2D24 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com