Re: 答复: One of three monitors can not be started

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 i checked the cluster state, it has recoveried to HEALTH_OK. i don's know why.

yesterday, 09:02, i started the mon.computer06 , it can not be started, the log‘s in attachment 0902.

and 16:38, i started the mon.computer06 again,  it also stucked with these processes:
/usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
/usr/sbin/ceph-create-keys -i computer06


but in this morning, it just be ok. the log's in attachment 1638. anyone can explain that?





To: greg@xxxxxxxxxxx
From: zhanghaoyu1988@xxxxxxxxxxx
Subject: 答复: [ceph-users] One of three monitors can not be started
Date: Thu, 2 Apr 2015 07:53:19 +0800

it has no reponds.

发件人: Gregory Farnum
发送时间: ‎2015/‎4/‎2 1:01
收件人: 张皓宇
主题: Re: [ceph-users] One of three monitors can not be started

On Tue, Mar 31, 2015 at 10:25 PM, 张皓宇 <zhanghaoyu1988@xxxxxxxxxxx> wrote:
> There is asok on computer06.
> I tried to start the mon.computer06, maybe two hours later,  the
> mon.computer06 still not start,
> but there are some different processes on computer06, I don't know how to
> handle it:
> root      7812     1  0 11:39 pts/4    00:00:00 python
> /usr/sbin/ceph-create-keys -i computer06

That's a thing that runs on every monitor invocation to make sure
necessary keys are in place; it's just stuck because the monitor isn't
working.

> root     11025     1 12 09:02 pts/4    00:32:13 /usr/bin/ceph-mon -i
> computer06 --pid-file /var/run/ceph/mon.computer06.pid -c
> /etc/ceph/ceph.conf

That's the monitor.

> root     35692  7812  0 12:59 pts/4    00:00:00 python /usr/bin/ceph
> --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok
> mon_status

This is an attempt of yours to invoke mon_status on the admin socket.
So you're saying the admin socket is there but it's not responding to
queries?

>
>
> I got the quorum_status from another running monitor:
> { "election_epoch": 508,
>   "quorum": [
>         0,
>         1],
>   "quorum_names": [
>         "computer05",
>         "computer04"],
>   "quorum_leader_name": "computer04",
>   "monmap": { "epoch": 4,
>       "fsid": "471483e5-493f-41f6-b6f4-0187c13d156d",
>       "modified": "2014-07-26 09:52:02.411967",
>       "created": "0.000000",
>       "mons": [
>             { "rank": 0,
>               "name": "computer04",
>               "addr": "192.168.1.60:6789\/0"},
>             { "rank": 1,
>               "name": "computer05",
>               "addr": "192.168.1.65:6789\/0"},
>             { "rank": 2,
>               "name": "computer06",
>               "addr": "192.168.1.66:6789\/0"}]}}

And that indicates mon.computer04 and mon.computer05 are working and
in a quorum together to make progress.

You said that computer05 got compacted, but that computer06 broke?
Given that computer04 is doing fine, it may not be related. If you
gather a log from mon.computer06 trying to start up (with "debug mon =
20" in the config file to dump a lot of output) somebody may be able
to help you.
-Greg

>
>
>
>> Date: Tue, 31 Mar 2015 12:30:22 -0700
>> Subject: Re: [ceph-users] One of three monitors can not be started
>> From: greg@xxxxxxxxxxx
>> To: zhanghaoyu1988@xxxxxxxxxxx
>> CC: ceph-users@xxxxxxxxxxxxxx
>
>>
>> On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 <zhanghaoyu1988@xxxxxxxxxxx> wrote:
>> > Who can help me?
>> >
>> > One monitor in my ceph cluster can not be started.
>> > Before that, I added '[mon] mon_compact_on_start = true' to
>> > /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell
>> > mon.computer05 compact ' on computer05, which has a monitor on it.
>> > When store.db of computer05 changed from 108G to 1G, mon.computer06
>> > stoped,
>> > and it can not be started since that.
>> >
>> > If I start mon.computer06, it will stop on this state:
>> > # /etc/init.d/ceph start mon.computer06
>> > === mon.computer06 ===
>> > Starting Ceph mon.computer06 on computer06...
>> >
>> > The process info is like this:
>> > root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start
>> > mon.computer06
>> > root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768;
>> > /usr/bin/ceph-mon -i computer06 --pid-file
>> > /var/run/ceph/mon.computer06.pid
>> > -c /etc/ceph/ceph.conf
>> > root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06
>> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
>> > root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i
>> > computer06
>> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf
>> >
>> > Log on computer06 is like this:
>> > 2015-03-30 20:46:54.152956 7fc5379d07a0 0 ceph version 0.72.2
>> > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309
>> > ...
>> > 2015-03-30 20:46:54.759791 7fc5379d07a0 1 mon.computer06@-1(probing) e4
>> > preinit clean up potentially inconsistent store state
>>
>> So I haven't looked at this code in a while, but I think the monitor
>> is trying to validate that it's consistent with the others. You
>> probably want to dig around the monitor admin sockets and see what
>> state each monitor is in, plus its perception of the others.
>>
>> In this case, I think maybe mon.computer06 is trying to examine its
>> whole store, but 100GB is a lot (way too much, in fact), so this can
>> take a loooong time.
>>
>> >
>> > Sorry, my English is not good.
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >

Attachment: 0902
Description: Binary data

Attachment: 1638
Description: Binary data

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux