Re: One of three monitors can not be started

张皓宇 <zhanghaoyu1988@xxxxxxxxxxx> · Wed, 1 Apr 2015 13:28:12 +0800

There is asok on computer06. 
I tried to start the mon.computer06, maybe two hours later,  the mon.computer06 still not start,
but there are some different processes on computer06, I don't know how to handle it:
root      7812     1  0 11:39 pts/4    00:00:00 python /usr/sbin/ceph-create-keys -i computer06
root     11025     1 12 09:02 pts/4    00:32:13 /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
root     35692  7812  0 12:59 pts/4    00:00:00 python /usr/bin/ceph --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok mon_status

I got the quorum_status from another running monitor:
{ "election_epoch": 508,
  "quorum": [
        0,
        1],
  "quorum_names": [
        "computer05",
        "computer04"],
  "quorum_leader_name": "computer04",
  "monmap": { "epoch": 4,
      "fsid": "471483e5-493f-41f6-b6f4-0187c13d156d",
      "modified": "2014-07-26 09:52:02.411967",
      "created": "0.000000",
      "mons": [
            { "rank": 0,
              "name": "computer04",
              "addr": "192.168.1.60:6789\/0"},
            { "rank": 1,
              "name": "computer05",
              "addr": "192.168.1.65:6789\/0"},
            { "rank": 2,
              "name": "computer06",
              "addr": "192.168.1.66:6789\/0"}]}} 

> Date: Tue, 31 Mar 2015 12:30:22 -0700
> Subject: Re: [ceph-users] One of three monitors can not be started
> From: greg@xxxxxxxxxxx
> To: zhanghaoyu1988@xxxxxxxxxxx
> CC: ceph-users@xxxxxxxxxxxxxx
> 
> On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 <zhanghaoyu1988@xxxxxxxxxxx> wrote:
> > Who can help me?
> >
> > One monitor in my ceph cluster can not be started.
> > Before that, I added '[mon] mon_compact_on_start = true' to
> > /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
> > mon.computer05 compact ' on computer05, which has a monitor on it.
> > When store.db of computer05 changed from 108G to 1G,  mon.computer06 stoped,
> > and it can not be started since that.
> >
> > If I start mon.computer06, it will stop on this state:
> > # /etc/init.d/ceph start mon.computer06
> > === mon.computer06 ===
> > Starting Ceph mon.computer06 on computer06...
> >
> > The process info is like this:
> > root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
> > mon.computer06
> > root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
> > /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid
> > -c /etc/ceph/ceph.conf
> > root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
> > root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06
> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
> >
> > Log on computer06 is like this:
> > 2015-03-30 20:46:54.152956 7fc5379d07a0  0 ceph version 0.72.2
> > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
> > ...
> > 2015-03-30 20:46:54.759791 7fc5379d07a0  1 mon.computer06@-1(probing) e4
> > preinit clean up potentially inconsistent store state
> 
> So I haven't looked at this code in a while, but I think the monitor
> is trying to validate that it's consistent with the others. You
> probably want to dig around the monitor admin sockets and see what
> state each monitor is in, plus its perception of the others.
> 
> In this case, I think maybe mon.computer06 is trying to examine its
> whole store, but 100GB is a lot (way too much, in fact), so this can
> take a loooong time.
> 
> >
> > Sorry, my English is not good.
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com