On Mon, 5 Aug 2013, Yu Changyuan wrote: > The good news is, with new patch, ceph start OK, cephfs mount OK, and kvm > virtual machine use rbd boot OK(and seems running ok), and I check the > timestamp of last file write to cephfs, it's fair near to the time of > reboot(which cause ceph not work any more). Since I don't have any other way > to check the integrity of the files store in cephfs, I just randomly pick > some video files, and play it, all seems OK. > > So, thank you very much. > > But, I do not use the last version of files in /var/lib/ceph/mon/ceph-a, > with these files, ceph-mon startup ok, and ceph -s returns, but osd still > think the monitor is wrong node and refuse to work. > Then I think I may try the files of 2 day ago(Aug 1st) and see what happen, > and something actually happen, that is ceph-osd start to work. > So, I am a bit curious about why patched version work with the ceph-mon data > 2 days ago but original version not, > and what more important, do I need extra step to make current running ceph > cluster to work with a normal version(not patched) ceph, > and are there any chance that current cluster will run into problem in the > future(keep current state and do not take any extra step). I think you will be fine with the current state and switching back to normal release code. I'm confused why ceph-osds wouldn't start with the latest mon data, but can't speculate too much without spending time analyzing your logs from the failed startup. Glad to hear you're back online! sage > > > > On Mon, Aug 5, 2013 at 12:39 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Sun, 4 Aug 2013, Yu Changyuan wrote: > > And here is the log of ceph-mon, with debug_mon set to 10, I run > "ceph -s" > > command(which is blocked) on 192.168.1.2 during recording this log. > > > > https://gist.github.com/yuchangyuan/ba3e72452215221d1e82 > > I pushed one more patch to that branch that should get you up. This > one > should go to master as well. > > sage > > > > > > > On Sun, Aug 4, 2013 at 3:25 PM, Yu Changyuan <reivzy@xxxxxxxxx> > wrote: > > I just try the branch, and mon start ok, here is the log: > > https://gist.github.com/yuchangyuan/3138952ac60508d18aed > > But ceph -s or ceph -w just block, without any message > return(I > > just start monitor, no mds or osd). > > > > > > > > On Sun, Aug 4, 2013 at 12:23 PM, Yu Changyuan <reivzy@xxxxxxxxx> > > wrote: > > > > On Sun, Aug 4, 2013 at 12:16 PM, Sage Weil > > <sage@xxxxxxxxxxx> wrote: > > It looks like the auth state wasn't trimmed > > properly. It also sort of > > looks like you aren't using authentication on > > this cluster... is that > > true? (The keyring file was empty.) > > > > Yes, your're right, I disable auth. It's just a personal > > cluster, so the simpler the better. > > > > This looks like a trim issue, but I don't remember > > what all we fixed since > > .1.. that was a while ago! We certainly haven't > > seen anything like this > > recently. > > > > I pushed a branch wip-mon-skip-auth-cuttlefish that > > skips the missing > > incrementals and will get your mon up, but you may > > lose some auth keys. > > If auth is on, you'll need ot add them back again. > > If not, it may just > > work with this. > > > > You can grab the packages from > > > > http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-mon-skip- > > > auth-cuttlefish > > > > or whatever the right dir is for your distro when > > they appear in about 15 > > minutes. Let me know if that resolves it. > > > > > > Thank you for your work, I will try as soon as possible. > > PS: My distro is Gentoo, so maybe I should build from source > > directly. > > > > > > sage > > > > > > On Sun, 4 Aug 2013, Yu Changyuan wrote: > > > > > > > > > > > > > > On Sun, Aug 4, 2013 at 12:13 AM, Sage Weil > > <sage@xxxxxxxxxxx> wrote: > > > On Sat, 3 Aug 2013, Yu Changyuan wrote: > > > > I run a tiny ceph cluster with only one > > monitor. After a > > > reboot the system, > > > > the monitor refuse to start. > > > > I try to start ceph-mon manually with > > command 'ceph -f -i a', > > > below is > > > > first few lines of the output: > > > > > > > > starting mon.a rank 0 at > > 192.168.1.10:6789/0 mon_data > > > > /var/lib/ceph/mon/ceph-a fsid > > > 554bee60-9602-4017-a6e1-ceb6907a218c > > > > mon/AuthMonitor.cc: In function 'virtual > > void > > > > AuthMonitor::update_from_paxos()' thread > > 7f9e3b0db780 time > > > 2013-08-03 > > > > 20:27:29.208156 > > > > mon/AuthMonitor.cc: 147: FAILED assert(ret > > == 0) > > > > > > > > The full log is at: > > > > > https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8 > > > > > > This is 0.61.1. Can you try again with 0.61.7 to > > rule out anything > > > there? > > > > > > > > > I just tried 0.61.7, still out of luck. Here is > > the log: > > > > > https://gist.github.com/yuchangyuan/34743c0abf1bfd8ef243 > > > > > > > > > > So, are there any way to make the monitor > > work again? > > > > > > > > I have a backup of > > /var/lib/ceph/mon/ceph-a in 2013-08-01, > > > and success > > > > start the monitor with these files, > > > > but rados and other command not work > > because osd keep saying > > > the monitor is > > > > the wrong node(that's right, it's actually > > the node 2 days > > > ago). > > > > > > In general that is not going to work well as the > > cluster does not like > > > to > > > warp back in time. If it does not start with .7 > > (I suspect it won't), > > > can > > > you send us a tarball of the mon data directory so > > we can see what is > > > awry? > > > > > > > > > OK, I will send the tarball of > > /var/lib/ceph/mon/ceph-a to you directly. > > > > > > > > > sage > > > > > > > > > > > > > > > -- > > > Best regards, > > > Changyuan > > > > > > > > > > > > > > > > -- > > Best regards, > > Changyuan > > > > > > > > > > -- > > Best regards, > > Changyuan > > > > > > > > > > -- > > Best regards, > > Changyuan > > > > > > > > > -- > Best regards, > Changyuan > >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com