Hi Karan,
I resolved it the same way you did. We had a network partition that caused the MON to die, it appears.
I'm running 0.72.1
It would be nice if redeploying wasn't the solution, but if it's simply cleaner to do so, then I will continue along that route.
I think what's more troubling is that when this occurred we lost all connectivity to the Ceph cluster.
On Wed, Feb 5, 2014 at 1:11 AM, Karan Singh <ksingh@xxxxxx> wrote:
Hi Greg
I have seen this problem before in my cluster.
- What ceph version you are running
- Did you made any change recently in the cluster , that resulted in this problem
You identified correct , the only problem is ceph-mon-2003 is listening to incorrect port , it should listen on port 6789 ( like the other two monitors ) . How i resolved is cleanly removing the infected monitor node and adding it back to cluster.
Regards
Karan
From: "Greg Poirier" <greg.poirier@xxxxxxxxxx>
To: ceph-users@xxxxxxxxxxxxxx
Sent: Tuesday, 4 February, 2014 10:50:21 PM
Subject: [ceph-users] Ceph MON can no longer join quorumI have a MON that at some point lost connectivity to the rest of the cluster and now cannot rejoin.Each time I restart it, it looks like it's attempting to create a new MON and join the cluster, but the rest of the cluster rejects it, because the new one isn't in the monmap.I don't know why it suddenly decided it needed to be a new MON.I am not really sure where to start.root@ceph-mon-2003:/var/log/ceph# ceph -scluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8health HEALTH_ERR 1 pgs inconsistent; 2 pgs peering; 126 pgs stale; 2 pgs stuck inactive; 126 pgs stuck stale; 2 pgs stuck unclean; 10 requests are blocked > 32 sec; 1 scrub errors; 1 mons down, quorum 0,1 ceph-mon-2001,ceph-mon-2002monmap e2: 3 mons at {ceph-mon-2001=10.30.66.13:6789/0,ceph-mon-2002=10.30.66.14:6789/0,ceph-mon-2003=10.30.66.15:6800/0}, election epoch 12964, quorum 0,1 ceph-mon-2001,ceph-mon-2002Notice ceph-mon-2003:6800If I try to start ceph-mon-all, it will be listening on some other port...root@ceph-mon-2003:/var/log/ceph# start ceph-mon-allceph-mon-all start/runningroot@ceph-mon-2003:/var/log/ceph# ps -ef | grep cephroot 6930 1 31 15:49 ? 00:00:00 /usr/bin/ceph-mon --cluster=ceph -i ceph-mon-2003 -froot 6931 1 3 15:49 ? 00:00:00 python /usr/sbin/ceph-create-keys --cluster=ceph -i ceph-mon-2003root@ceph-mon-2003:/var/log/ceph# ceph -s2014-02-04 15:49:56.854866 7f9cf422d700 0 -- :/1007028 >> 10.30.66.15:6789/0 pipe(0x7f9cf0021370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cf00215d0).faultcluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8health HEALTH_ERR 1 pgs inconsistent; 2 pgs peering; 126 pgs stale; 2 pgs stuck inactive; 126 pgs stuck stale; 2 pgs stuck unclean; 10 requests are blocked > 32 sec; 1 scrub errors; 1 mons down, quorum 0,1 ceph-mon-2001,ceph-mon-2002monmap e2: 3 mons at {ceph-mon-2001=10.30.66.13:6789/0,ceph-mon-2002=10.30.66.14:6789/0,ceph-mon-2003=10.30.66.15:6800/0}, election epoch 12964, quorum 0,1 ceph-mon-2001,ceph-mon-2002Suggestions?_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com