Re: About single monitor recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 4 Aug 2013, Yu Changyuan wrote:
> And here is the log of ceph-mon, with debug_mon set to 10, I run "ceph -s"
> command(which is blocked) on 192.168.1.2 during recording this log.
> 
> https://gist.github.com/yuchangyuan/ba3e72452215221d1e82

I pushed one more patch to that branch that should get you up.  This one 
should go to master as well.

sage

> 
> 
> On Sun, Aug 4, 2013 at 3:25 PM, Yu Changyuan <reivzy@xxxxxxxxx> wrote:
>       I just try the branch, and mon start ok, here is the log:
>       https://gist.github.com/yuchangyuan/3138952ac60508d18aed
>       But ceph -s or ceph -w just block, without any message return(I
>       just start monitor, no mds or osd).
> 
> 
> 
> On Sun, Aug 4, 2013 at 12:23 PM, Yu Changyuan <reivzy@xxxxxxxxx>
> wrote:
> 
>       On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil
>       <sage@xxxxxxxxxxx> wrote:
>             It looks like the auth state wasn't trimmed
>             properly.  It also sort of
>             looks like you aren't using authentication on
>             this cluster... is that
>             true?  (The keyring file was empty.)
> 
> Yes, your're right, I disable auth. It's just a personal
> cluster, so the simpler the better.
> 
>       This looks like a trim issue, but I don't remember
>       what all we fixed since
>       .1.. that was a while ago!  We certainly haven't
>       seen anything like this
>       recently.
> 
>       I pushed a branch wip-mon-skip-auth-cuttlefish that
>       skips the missing
>       incrementals and will get your mon up, but you may
>       lose some auth keys.
>       If auth is on, you'll need ot add them back again.
>        If not, it may just
>       work with this.
> 
>       You can grab the packages from
> 
>  http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-
>       auth-cuttlefish
> 
>       or whatever the right dir is for your distro when
>       they appear in about 15
>       minutes.  Let me know if that resolves it.
> 
>  
> Thank you for your work, I will try as soon as possible.
> PS: My distro is Gentoo, so maybe I should build from source
> directly.
>  
> 
>       sage
> 
> 
>       On Sun, 4 Aug 2013, Yu Changyuan wrote:
> 
>       >
>       >
>       >
>       > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil
>       <sage@xxxxxxxxxxx> wrote:
>       >       On Sat, 3 Aug 2013, Yu Changyuan wrote:
>       >       > I run a tiny ceph cluster with only one
>       monitor. After a
>       >       reboot the system,
>       >       > the monitor refuse to start.
>       >       > I try to start ceph-mon manually with
>       command 'ceph -f -i a',
>       >        below is
>       >       > first few lines of the output:
>       >       >
>       >       > starting mon.a rank 0 at
>       192.168.1.10:6789/0 mon_data
>       >       > /var/lib/ceph/mon/ceph-a fsid
>       >       554bee60-9602-4017-a6e1-ceb6907a218c
>       >       > mon/AuthMonitor.cc: In function 'virtual
>       void
>       >       > AuthMonitor::update_from_paxos()' thread
>       7f9e3b0db780 time
>       >       2013-08-03
>       >       > 20:27:29.208156
>       >       > mon/AuthMonitor.cc: 147: FAILED assert(ret
>       == 0)
>       >       >
>       >       > The full log is at:
>       >      
>       https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
>       >
>       > This is 0.61.1.  Can you try again with 0.61.7 to
>       rule out anything
>       > there?
>       >
>       >  
>       > I just tried 0.61.7, still out of luck. Here is
>       the log: 
>       >
>       https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243
>       >
>       >
>       >       > So, are there any way to make the monitor
>       work again?
>       >       >
>       >       > I have a backup of
>       /var/lib/ceph/mon/ceph-a  in 2013-08-01,
>       >       and success
>       >       > start the monitor with these files,
>       >       > but rados and other command not work
>       because osd keep saying
>       >       the monitor is
>       >       > the wrong node(that's right, it's actually
>       the node 2 days
>       >       ago).
>       >
>       > In general that is not going to work well as the
>       cluster does not like
>       > to
>       > warp back in time.  If it does not start with .7
>       (I suspect it won't),
>       > can
>       > you send us a tarball of the mon data directory so
>       we can see what is
>       > awry? 
>       >
>       >  
>       > OK, I will send the tarball of
>       /var/lib/ceph/mon/ceph-a to you directly.
>       >  
>       >
>       >       sage
>       >
>       >
>       >
>       >
>       > --
>       > Best regards,
>       > Changyuan
>       >
>       >
> 
> 
> 
> 
> --
> Best regards,
> Changyuan
> 
> 
> 
> 
> --
> Best regards,
> Changyuan
> 
> 
> 
> 
> --
> Best regards,
> Changyuan
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux