Re: Disaster recovery of monitor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, June 13, 2013, wrote:
Hello,

We ran into a problem with our test cluster after adding monitors. It now seems that our main monitor doesn't want to start anymore. The logs are flooded with:

2013-06-13 11:41:05.316982 7f7689ca4780  7 mon.a@0(leader).osd e2809 update_from_paxos  applying incremental 2810
2013-06-13 11:41:05.317043 7f7689ca4780  1 mon.a@0(leader).osd e2809 e2809: 9 osds: 9 up, 9 in
2013-06-13 11:41:05.317064 7f7689ca4780  7 mon.a@0(leader).osd e2809 update_from_paxos  applying incremental 2810

Is this accurate? It's applying the *same* incremental I've and over again?


 

etc

When starting after a while we get the following error:

service ceph start mon.a
=== mon.a ===
Starting Ceph mon.a on xxxxx...
[22037]: (33) Numerical argument out of domain
failed: 'ulimit -n 8192;  /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf '
Starting ceph-create-keys on xxxx...

Is there are disaster recovery method for monitors? This is just a test environment so I don't really care about the data but if something like this happens on a production environment I would like to know how to get it back (if at all possible).

We just upgraded to 0.61.3. Perhaps we ran into a bug. When adding the monitors we just followed this guide:

http://ceph.com/docs/next/rados/operations/add-or-rm-mons/

After adding the monitors we ran into problems and we tried to fix it with information we could find online and we started playing with monmap and I think this is where it went bad.

Started playing with the monmap? Please describe in more detail the steps you took, and the monitors you had at each point.

Are your other monitors working? If so it's easy enough to remove this one, wipe it out, and add it back in. I'm curious about that weird update loop, though, if you can help us look at that.
-Greg

 

We are running ceph version 0.61.3 (92b1e398576d55df8e5888dd1a9545ed3fd99532)

/etc/ceph/ceph.conf is pretty simple for the monitor:

[global]
        auth supported = none
        auth cluster required = none
        auth service required = none
        auth client required = none

        public network = xxx.xxx.0.0/24
        cluster network = xxx.xxx.0.0/24

        mon initial members = xxxxx
[osd]
        osd journal size = 1000

[mds.a]
        host = xxxxx
        devs = /dev/sdb
        mds data = "">
[mon.a]
        host = xxxxx
        mon addr = xxx.xxx.0.25:6789
        mon data = "">
etc

Thanks for looking and if you need more info let me know.

Cheers,

Peter
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux