Re: Fwd: monitor crashing

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 13 Oct 2015 06:41:12 -0700 (PDT)

On Tue, 13 Oct 2015, Luis Periquito wrote:
> the store.db dir is 3.4GB big :(
> 
> can I do it on my side?

Nevermind, I was able to reproduce it from the bugzilla.  I've pushed a 
branch wip-ecpool-hammer.  Not sure which distro you're on, but packages 
will appear at gitbuilder.ceph.com in 30-45 minutes.  This fixes the mon 
crash, which will let you delete the pool.  I suggest stopping the OSDs 
before starting the mon with this or else they might get pg create 
messages and crash too.  Once the pool is removed you can start them 
again.  They shouldn't need to be upgraded.

Note that the latest hammer doesn't let you create the pool at all because 
it fails the crush safety check (I had to disable the check to reproduce 
this), so that's good at least!

sage

> 
> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Tue, 13 Oct 2015, Luis Periquito wrote:
> >> Any ideas? I'm growing desperate :(
> >>
> >> I've tried compiling from source, and including
> >> https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
> >> of the ceph-mon
> >
> > If you can email a (link to a) tarball of your mon data directory I'd love
> > to extract the osdmap and see why crush is crashing.. it's obviously not
> > supposed to do that (even with a bad rule).  You can also use
> > the ceph-post-file utility.
> >
> > Thanks!
> > sage
> >
> >
> >>
> >> ---------- Forwarded message ----------
> >> From: Luis Periquito <periquito@xxxxxxxxx>
> >> Date: Tue, Oct 13, 2015 at 12:26 PM
> >> Subject: Re: monitor crashing
> >> To: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
> >>
> >>
> >> I'm currently running Hammer (0.94.3), created an invalid LRC profile
> >> (typo in the l=, should have been l=4 but was l=3, and now I don't
> >> have enough different ruleset-locality) and created a pool. Is there
> >> any way to delete this pool? remember I can't start the ceph-mon...
> >>
> >> On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito <periquito@xxxxxxxxx> wrote:
> >> > It seems I've hit this bug:
> >> > https://bugzilla.redhat.com/show_bug.cgi?id=1231630
> >> >
> >> > is there any way I can recover this cluster? It worked in our test
> >> > cluster, but crashed the production one...
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html