Re: About single monitor recovery

Yu Changyuan <reivzy@xxxxxxxxx> · Mon, 5 Aug 2013 19:31:15 +0800

The good news is, with new patch, ceph start OK, cephfs mount OK, and kvm virtual machine use rbd boot OK(and seems running ok), and I check the timestamp of last file write to cephfs, it's fair near to the time of reboot(which cause ceph not work any more). Since I don't have any other way to check the integrity of  the files store in cephfs, I just randomly pick some video files, and play it, all seems OK.

So, thank you very much.

But, I do not use the last version of files in /var/lib/ceph/mon/ceph-a,
with these files, ceph-mon startup ok, and ceph -s returns, but osd still think the monitor is wrong node and refuse to work. 

Then I think I may try the files of 2 day ago(Aug 1st) and see what happen, and something actually happen, that is ceph-osd start to work.
So, I am a bit curious about why patched version work with the ceph-mon data 2 days ago but original version not, 

and what more important, do I need extra step to make current running ceph cluster to work with a normal version(not patched) ceph,
and are there any chance that current cluster will run into problem in the future(keep current state and do not take any extra step).

On Mon, Aug 5, 2013 at 12:39 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:

On Sun, 4 Aug 2013, Yu Changyuan wrote:

> And here is the log of ceph-mon, with debug_mon set to 10, I run "ceph -s"

> command(which is blocked) on 192.168.1.2 during recording this log.

>

> https://gist.github.com/yuchangyuan/ba3e72452215221d1e82

I pushed one more patch to that branch that should get you up.  This one

should go to master as well.

sage

>

>

> On Sun, Aug 4, 2013 at 3:25 PM, Yu Changyuan <reivzy@xxxxxxxxx> wrote:

>       I just try the branch, and mon start ok, here is the log:

>       https://gist.github.com/yuchangyuan/3138952ac60508d18aed

>       But ceph -s or ceph -w just block, without any message return(I

>       just start monitor, no mds or osd).

>

>

>

> On Sun, Aug 4, 2013 at 12:23 PM, Yu Changyuan <reivzy@xxxxxxxxx>

> wrote:

>

>       On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil

>       <sage@xxxxxxxxxxx> wrote:

>             It looks like the auth state wasn't trimmed

>             properly.  It also sort of

>             looks like you aren't using authentication on

>             this cluster... is that

>             true?  (The keyring file was empty.)

>

> Yes, your're right, I disable auth. It's just a personal

> cluster, so the simpler the better.

>

>       This looks like a trim issue, but I don't remember

>       what all we fixed since

>       .1.. that was a while ago!  We certainly haven't

>       seen anything like this

>       recently.

>

>       I pushed a branch wip-mon-skip-auth-cuttlefish that

>       skips the missing

>       incrementals and will get your mon up, but you may

>       lose some auth keys.

>       If auth is on, you'll need ot add them back again.

>        If not, it may just

>       work with this.

>

>       You can grab the packages from

>

>  http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip-

>       auth-cuttlefish

>

>       or whatever the right dir is for your distro when

>       they appear in about 15

>       minutes.  Let me know if that resolves it.

>

>  

> Thank you for your work, I will try as soon as possible.

> PS: My distro is Gentoo, so maybe I should build from source

> directly.

>  

>

>       sage

>

>

>       On Sun, 4 Aug 2013, Yu Changyuan wrote:

>

>       >

>       >

>       >

>       > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil

>       <sage@xxxxxxxxxxx> wrote:

>       >       On Sat, 3 Aug 2013, Yu Changyuan wrote:

>       >       > I run a tiny ceph cluster with only one

>       monitor. After a

>       >       reboot the system,

>       >       > the monitor refuse to start.

>       >       > I try to start ceph-mon manually with

>       command 'ceph -f -i a',

>       >        below is

>       >       > first few lines of the output:

>       >       >

>       >       > starting mon.a rank 0 at

>       192.168.1.10:6789/0 mon_data

>       >       > /var/lib/ceph/mon/ceph-a fsid

>       >       554bee60-9602-4017-a6e1-ceb6907a218c

>       >       > mon/AuthMonitor.cc: In function 'virtual

>       void

>       >       > AuthMonitor::update_from_paxos()' thread

>       7f9e3b0db780 time

>       >       2013-08-03

>       >       > 20:27:29.208156

>       >       > mon/AuthMonitor.cc: 147: FAILED assert(ret

>       == 0)

>       >       >

>       >       > The full log is at:

>       >      

>       https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8

>       >

>       > This is 0.61.1.  Can you try again with 0.61.7 to

>       rule out anything

>       > there?

>       >

>       >  

>       > I just tried 0.61.7, still out of luck. Here is

>       the log: 

>       >

>       https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243

>       >

>       >

>       >       > So, are there any way to make the monitor

>       work again?

>       >       >

>       >       > I have a backup of

>       /var/lib/ceph/mon/ceph-a  in 2013-08-01,

>       >       and success

>       >       > start the monitor with these files,

>       >       > but rados and other command not work

>       because osd keep saying

>       >       the monitor is

>       >       > the wrong node(that's right, it's actually

>       the node 2 days

>       >       ago).

>       >

>       > In general that is not going to work well as the

>       cluster does not like

>       > to

>       > warp back in time.  If it does not start with .7

>       (I suspect it won't),

>       > can

>       > you send us a tarball of the mon data directory so

>       we can see what is

>       > awry? 

>       >

>       >  

>       > OK, I will send the tarball of

>       /var/lib/ceph/mon/ceph-a to you directly.

>       >  

>       >

>       >       sage

>       >

>       >

>       >

>       >

>       > --

>       > Best regards,

>       > Changyuan

>       >

>       >

>

>

>

>

> --

> Best regards,

> Changyuan

>

>

>

>

> --

> Best regards,

> Changyuan

>

>

>

>

> --

> Best regards,

> Changyuan

>

> 

-- 
Best regards,
Changyuan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com