I just try the branch, and mon start ok, here is the log: https://gist.github.com/yuchangyuan/3138952ac60508d18aed
But ceph -s or ceph -w just block, without any message return(I just start monitor, no mds or osd).
But ceph -s or ceph -w just block, without any message return(I just start monitor, no mds or osd).
On Sun, Aug 4, 2013 at 12:23 PM, Yu Changyuan <reivzy@xxxxxxxxx> wrote:
On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
It looks like the auth state wasn't trimmed properly. It also sort of
looks like you aren't using authentication on this cluster... is that
true? (The keyring file was empty.)
Yes, your're right, I disable auth. It's just a personal cluster, so the simpler the better.This looks like a trim issue, but I don't remember what all we fixed since
.1.. that was a while ago! We certainly haven't seen anything like this
recently.
I pushed a branch wip-mon-skip-auth-cuttlefish that skips the missing
incrementals and will get your mon up, but you may lose some auth keys.
If auth is on, you'll need ot add them back again. If not, it may just
work with this.
You can grab the packages from
http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-auth-cuttlefish
or whatever the right dir is for your distro when they appear in about 15
minutes. Let me know if that resolves it.Thank you for your work, I will try as soon as possible.
PS: My distro is Gentoo, so maybe I should build from source directly.
sage
On Sun, 4 Aug 2013, Yu Changyuan wrote:
>
>
>
> On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> On Sat, 3 Aug 2013, Yu Changyuan wrote:
> > I run a tiny ceph cluster with only one monitor. After a
> reboot the system,
> > the monitor refuse to start.
> > I try to start ceph-mon manually with command 'ceph -f -i a',
> below is
> > first few lines of the output:
> >
> > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data
> > /var/lib/ceph/mon/ceph-a fsid
> 554bee60-9602-4017-a6e1-ceb6907a218c
> > mon/AuthMonitor.cc: In function 'virtual void
> > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time
> 2013-08-03
> > 20:27:29.208156
> > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0)
> >
> > The full log is at:
> https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
>
> This is 0.61.1. Can you try again with 0.61.7 to rule out anything
> there?
>
>
> I just tried 0.61.7, still out of luck. Here is the log:
> https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243
>
>
> > So, are there any way to make the monitor work again?
> >
> > I have a backup of /var/lib/ceph/mon/ceph-a in 2013-08-01,
> and success
> > start the monitor with these files,
> > but rados and other command not work because osd keep saying
> the monitor is
> > the wrong node(that's right, it's actually the node 2 days
> ago).
>
> In general that is not going to work well as the cluster does not like
> to
> warp back in time. If it does not start with .7 (I suspect it won't),
> can
> you send us a tarball of the mon data directory so we can see what is
> awry?
>
>
> OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly.
>
>
> sage
>
>
>
>
> --
> Best regards,
> Changyuan
>
>
--
Best regards,
Changyuan
--
Best regards,
Changyuan
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com