i checked the cluster state, it has recoveried to HEALTH_OK. i don's know why.
yesterday, 09:02, i started the mon.computer06 , it can not be started, the log‘s in attachment 0902. and 16:38, i started the mon.computer06 again, it also stucked with these processes: /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf /usr/sbin/ceph-create-keys -i computer06 but in this morning, it just be ok. the log's in attachment 1638. anyone can explain that? To: greg@xxxxxxxxxxx From: zhanghaoyu1988@xxxxxxxxxxx Subject: 答复: [ceph-users] One of three monitors can not be started Date: Thu, 2 Apr 2015 07:53:19 +0800 it has no reponds.
发件人: Gregory Farnum 发送时间: 2015/4/2 1:01 收件人: 张皓宇 主题: Re: [ceph-users] One of three monitors can not be started On Tue, Mar 31, 2015 at 10:25 PM, 张皓宇 <zhanghaoyu1988@xxxxxxxxxxx> wrote:
> There is asok on computer06. > I tried to start the mon.computer06, maybe two hours later, the > mon.computer06 still not start, > but there are some different processes on computer06, I don't know how to > handle it: > root 7812 1 0 11:39 pts/4 00:00:00 python > /usr/sbin/ceph-create-keys -i computer06 That's a thing that runs on every monitor invocation to make sure necessary keys are in place; it's just stuck because the monitor isn't working. > root 11025 1 12 09:02 pts/4 00:32:13 /usr/bin/ceph-mon -i > computer06 --pid-file /var/run/ceph/mon.computer06.pid -c > /etc/ceph/ceph.conf That's the monitor. > root 35692 7812 0 12:59 pts/4 00:00:00 python /usr/bin/ceph > --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok > mon_status This is an attempt of yours to invoke mon_status on the admin socket. So you're saying the admin socket is there but it's not responding to queries? > > > I got the quorum_status from another running monitor: > { "election_epoch": 508, > "quorum": [ > 0, > 1], > "quorum_names": [ > "computer05", > "computer04"], > "quorum_leader_name": "computer04", > "monmap": { "epoch": 4, > "fsid": "471483e5-493f-41f6-b6f4-0187c13d156d", > "modified": "2014-07-26 09:52:02.411967", > "created": "0.000000", > "mons": [ > { "rank": 0, > "name": "computer04", > "addr": "192.168.1.60:6789\/0"}, > { "rank": 1, > "name": "computer05", > "addr": "192.168.1.65:6789\/0"}, > { "rank": 2, > "name": "computer06", > "addr": "192.168.1.66:6789\/0"}]}} And that indicates mon.computer04 and mon.computer05 are working and in a quorum together to make progress. You said that computer05 got compacted, but that computer06 broke? Given that computer04 is doing fine, it may not be related. If you gather a log from mon.computer06 trying to start up (with "debug mon = 20" in the config file to dump a lot of output) somebody may be able to help you. -Greg > > > >> Date: Tue, 31 Mar 2015 12:30:22 -0700 >> Subject: Re: [ceph-users] One of three monitors can not be started >> From: greg@xxxxxxxxxxx >> To: zhanghaoyu1988@xxxxxxxxxxx >> CC: ceph-users@xxxxxxxxxxxxxx > >> >> On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 <zhanghaoyu1988@xxxxxxxxxxx> wrote: >> > Who can help me? >> > >> > One monitor in my ceph cluster can not be started. >> > Before that, I added '[mon] mon_compact_on_start = true' to >> > /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell >> > mon.computer05 compact ' on computer05, which has a monitor on it. >> > When store.db of computer05 changed from 108G to 1G, mon.computer06 >> > stoped, >> > and it can not be started since that. >> > >> > If I start mon.computer06, it will stop on this state: >> > # /etc/init.d/ceph start mon.computer06 >> > === mon.computer06 === >> > Starting Ceph mon.computer06 on computer06... >> > >> > The process info is like this: >> > root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start >> > mon.computer06 >> > root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768; >> > /usr/bin/ceph-mon -i computer06 --pid-file >> > /var/run/ceph/mon.computer06.pid >> > -c /etc/ceph/ceph.conf >> > root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06 >> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf >> > root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i >> > computer06 >> > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf >> > >> > Log on computer06 is like this: >> > 2015-03-30 20:46:54.152956 7fc5379d07a0 0 ceph version 0.72.2 >> > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309 >> > ... >> > 2015-03-30 20:46:54.759791 7fc5379d07a0 1 mon.computer06@-1(probing) e4 >> > preinit clean up potentially inconsistent store state >> >> So I haven't looked at this code in a while, but I think the monitor >> is trying to validate that it's consistent with the others. You >> probably want to dig around the monitor admin sockets and see what >> state each monitor is in, plus its perception of the others. >> >> In this case, I think maybe mon.computer06 is trying to examine its >> whole store, but 100GB is a lot (way too much, in fact), so this can >> take a loooong time. >> >> > >> > Sorry, my English is not good. >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > |
Attachment:
0902
Description: Binary data
Attachment:
1638
Description: Binary data
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com