Re: Mon Create currently at the state of probing

Jim Forde <jimf@xxxxxxxxx> · Mon, 19 Jun 2017 17:09:22 +0000

No, I don’t think Ubuntu 14.04 has it enabled by default.
Double checked.
Sudo ufw status
Status: inactive.
No other symptoms of a firewall.

From: Sasha Litvak [mailto:alexander.v.litvak@xxxxxxxxx]

Sent: Sunday, June 18, 2017 11:10 PM

To: Jim Forde <jimf@xxxxxxxxx>

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re: [ceph-users] Mon Create currently at the state of probing

Do you have firewall on on new server by any chance?

On Sun, Jun 18, 2017 at 8:18 PM, Jim Forde <jimf@xxxxxxxxx> wrote:

I have an eight node ceph cluster running Jewel 10.2.5.
One Ceph-Deploy node. Four OSD nodes and three Monitor nodes.
Ceph-Deploy node is r710T
OSD’s are r710a, r710b, r710c, and r710d.
Mon’s are r710e, r710f, and r710g.

Name resolution is in Hosts file on each node.

Successfully removed Monitor r710e from cluster
Upgraded ceph-deploy node r710T to Kraken 11.2.0 (ceph -v returns 11.2.0 all other nodes are still 10.2.5)
Ceph -s is HEALTH_OK 2 mons
Rebuilt r710e with same OS (ubutnu 14.04 LTS) and same IP address.
“Ceph-deploy install –release kraken r710e” is successful with ceph -v returning 11.2.0 on node r710e
“ceph-deploy admin r710e” is successful and puts the keyring in /etc/ceph/ceph.client.admin.keyring
“sudo chmod +r /etc/ceph/ceph.client.admin.keyring”

Everything seems successful to this point.
Then I run
“ceph-deploy mon create r710e” and I get the following:

[r710e][DEBUG ] ********************************************************************************
[r710e][INFO  ] monitor: mon.r710e is currently at the state of probing
[r710e][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.r710e.asok mon_status
[r710e][WARNIN] r710e is not defined in `mon initial members`
[r710e][WARNIN] monitor r710e does not exist in monmap

R710e is in the ‘mon initial members’.

It is in the ceph.conf file correctly (it was running before and the parameters have not changed) Public and Cluster networks are defined.

It is the same physical server with the same (but freshly installed) OS and same IP address.
Looking at the local daemon mon_status on all three monitors I see.
R710f and r710g see r710e as an “extra_probe_peers”
R710e sees r710f and r710g as “extra_probe_peers”

“ceph-deploy purge r710e” and “ceph-deploy purgedata r710e” with a reboot of the 2 mon’s brings cluster back to HEALTH_OK

Not sure what is going on. Is Ceph allergic to single node upgrades? Afraid to push the upgrade on all mon’s.

What I have done:
Rebuilt r710e with different hardware. Rebuilt with different OS. Rebuilt with different name and IP address. Same result.
I have also restructured the NTP server. R710T is my NTP server on the cluster. (HEALTH_OK prior to updating) I reset all Mon nodes to get time from Ubuntu
 default NTP sources. Same error.

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com