Re: Disaster recovery of monitor

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 13 Jun 2013 09:06:28 -0700

On Thursday, June 13, 2013,   wrote:
Hello,

We ran into a problem with our test cluster after adding monitors. It now seems that our main monitor doesn't want to start anymore. The logs are flooded with:

2013-06-13 11:41:05.316982 7f7689ca4780  7 mon.a@0(leader).osd e2809 update_from_paxos  applying incremental 2810

2013-06-13 11:41:05.317043 7f7689ca4780  1 mon.a@0(leader).osd e2809 e2809: 9 osds: 9 up, 9 in

2013-06-13 11:41:05.317064 7f7689ca4780  7 mon.a@0(leader).osd e2809 update_from_paxos  applying incremental 2810

Is this accurate? It's applying the *same* incremental I've and over again?

etc

When starting after a while we get the following error:

service ceph start mon.a

=== mon.a ===

Starting Ceph mon.a on xxxxx...

[22037]: (33) Numerical argument out of domain

failed: 'ulimit -n 8192;  /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf '

Starting ceph-create-keys on xxxx...

Is there are disaster recovery method for monitors? This is just a test environment so I don't really care about the data but if something like this happens on a production environment I would like to know how to get it back (if at all possible).

We just upgraded to 0.61.3. Perhaps we ran into a bug. When adding the monitors we just followed this guide:

http://ceph.com/docs/next/rados/operations/add-or-rm-mons/

After adding the monitors we ran into problems and we tried to fix it with information we could find online and we started playing with monmap and I think this is where it went bad.

Started playing with the monmap? Please describe in more detail the steps you took, and the monitors you had at each point.

Are your other monitors working? If so it's easy enough to remove this one, wipe it out, and add it back in. I'm curious about that weird update loop, though, if you can help us look at that.
-Greg

We are running ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)

/etc/ceph/ceph.conf is pretty simple for the monitor:

[global]

        auth supported = none

        auth cluster required = none

        auth service required = none

        auth client required = none

        public network = xxx.xxx.0.0/24

        cluster network = xxx.xxx.0.0/24

        mon initial members = xxxxx

[osd]

        osd journal size = 1000

[mds.a]

        host = xxxxx

        devs = /dev/sdb

        mds data = "">

[mon.a]

        host = xxxxx

        mon addr = xxx.xxx.0.25:6789

        mon data = "">

etc

Thanks for looking and if you need more info let me know.

Cheers,

Peter

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com