There is asok on computer06. I tried to start the mon.computer06, maybe two hours later, the mon.computer06 still not start, but there are some different processes on computer06, I don't know how to handle it: root 7812 1 0 11:39 pts/4 00:00:00 python /usr/sbin/ceph-create-keys -i computer06 root 11025 1 12 09:02 pts/4 00:32:13 /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf root 35692 7812 0 12:59 pts/4 00:00:00 python /usr/bin/ceph --cluster=ceph --admin-daemon=/var/run/ceph/ceph-mon.computer06.asok mon_status I got the quorum_status from another running monitor: { "election_epoch": 508, "quorum": [ 0, 1], "quorum_names": [ "computer05", "computer04"], "quorum_leader_name": "computer04", "monmap": { "epoch": 4, "fsid": "471483e5-493f-41f6-b6f4-0187c13d156d", "modified": "2014-07-26 09:52:02.411967", "created": "0.000000", "mons": [ { "rank": 0, "name": "computer04", "addr": "192.168.1.60:6789\/0"}, { "rank": 1, "name": "computer05", "addr": "192.168.1.65:6789\/0"}, { "rank": 2, "name": "computer06", "addr": "192.168.1.66:6789\/0"}]}} > Date: Tue, 31 Mar 2015 12:30:22 -0700 > Subject: Re: [ceph-users] One of three monitors can not be started > From: greg@xxxxxxxxxxx > To: zhanghaoyu1988@xxxxxxxxxxx > CC: ceph-users@xxxxxxxxxxxxxx > > On Tue, Mar 31, 2015 at 2:50 AM, 张皓宇 <zhanghaoyu1988@xxxxxxxxxxx> wrote: > > Who can help me? > > > > One monitor in my ceph cluster can not be started. > > Before that, I added '[mon] mon_compact_on_start = true' to > > /etc/ceph/ceph.conf on three monitor hosts. Then I did 'ceph tell > > mon.computer05 compact ' on computer05, which has a monitor on it. > > When store.db of computer05 changed from 108G to 1G, mon.computer06 stoped, > > and it can not be started since that. > > > > If I start mon.computer06, it will stop on this state: > > # /etc/init.d/ceph start mon.computer06 > > === mon.computer06 === > > Starting Ceph mon.computer06 on computer06... > > > > The process info is like this: > > root 12149 3807 0 20:46 pts/27 00:00:00 /bin/sh /etc/init.d/ceph start > > mon.computer06 > > root 12308 12149 0 20:46 pts/27 00:00:00 bash -c ulimit -n 32768; > > /usr/bin/ceph-mon -i computer06 --pid-file /var/run/ceph/mon.computer06.pid > > -c /etc/ceph/ceph.conf > > root 12309 12308 0 20:46 pts/27 00:00:00 /usr/bin/ceph-mon -i computer06 > > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf > > root 12313 12309 19 20:46 pts/27 00:00:01 /usr/bin/ceph-mon -i computer06 > > --pid-file /var/run/ceph/mon.computer06.pid -c /etc/ceph/ceph.conf > > > > Log on computer06 is like this: > > 2015-03-30 20:46:54.152956 7fc5379d07a0 0 ceph version 0.72.2 > > (a913ded2ff138aefb8cb84d347d72164099cfd60), process ceph-mon, pid 12309 > > ... > > 2015-03-30 20:46:54.759791 7fc5379d07a0 1 mon.computer06@-1(probing) e4 > > preinit clean up potentially inconsistent store state > > So I haven't looked at this code in a while, but I think the monitor > is trying to validate that it's consistent with the others. You > probably want to dig around the monitor admin sockets and see what > state each monitor is in, plus its perception of the others. > > In this case, I think maybe mon.computer06 is trying to examine its > whole store, but 100GB is a lot (way too much, in fact), so this can > take a loooong time. > > > > > Sorry, my English is not good. > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com