Re: About single monitor recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It looks like the auth state wasn't trimmed properly.  It also sort of 
looks like you aren't using authentication on this cluster... is that 
true?  (The keyring file was empty.)

This looks like a trim issue, but I don't remember what all we fixed since 
.1.. that was a while ago!  We certainly haven't seen anything like this 
recently.

I pushed a branch wip-mon-skip-auth-cuttlefish that skips the missing 
incrementals and will get your mon up, but you may lose some auth keys.  
If auth is on, you'll need ot add them back again.  If not, it may just 
work with this.

You can grab the packages from

 http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-auth-cuttlefish

or whatever the right dir is for your distro when they appear in about 15 
minutes.  Let me know if that resolves it.

sage


On Sun, 4 Aug 2013, Yu Changyuan wrote:

> 
> 
> 
> On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>       On Sat, 3 Aug 2013, Yu Changyuan wrote:
>       > I run a tiny ceph cluster with only one monitor. After a
>       reboot the system,
>       > the monitor refuse to start.
>       > I try to start ceph-mon manually with command 'ceph -f -i a',
>        below is
>       > first few lines of the output:
>       >
>       > starting mon.a rank 0 at 192.168.1.10:6789/0 mon_data
>       > /var/lib/ceph/mon/ceph-a fsid
>       554bee60-9602-4017-a6e1-ceb6907a218c
>       > mon/AuthMonitor.cc: In function 'virtual void
>       > AuthMonitor::update_from_paxos()' thread 7f9e3b0db780 time
>       2013-08-03
>       > 20:27:29.208156
>       > mon/AuthMonitor.cc: 147: FAILED assert(ret == 0)
>       >
>       > The full log is at:
>       https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
> 
> This is 0.61.1.  Can you try again with 0.61.7 to rule out anything
> there?
> 
>  
> I just tried 0.61.7, still out of luck. Here is the log: 
> https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243
> 
> 
>       > So, are there any way to make the monitor work again?
>       >
>       > I have a backup of /var/lib/ceph/mon/ceph-a  in 2013-08-01,
>       and success
>       > start the monitor with these files,
>       > but rados and other command not work because osd keep saying
>       the monitor is
>       > the wrong node(that's right, it's actually the node 2 days
>       ago).
> 
> In general that is not going to work well as the cluster does not like
> to
> warp back in time.  If it does not start with .7 (I suspect it won't),
> can
> you send us a tarball of the mon data directory so we can see what is
> awry? 
> 
>  
> OK, I will send the tarball of /var/lib/ceph/mon/ceph-a to you directly.
>  
> 
>       sage
> 
> 
> 
> 
> --
> Best regards,
> Changyuan
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux