David, Thanks for the reply. The scenario: Monitor node fails for whatever reason, Bad blocks in HD, or Motherboard fail, whatever.
Procedure: Remove the monitor from the cluster, replace hardware, reinstall OS and add monitor to cluster.
That is exactly what I did. However, my ceph-deploy node had already been upgraded to Kraken. The goal is to not use this as an upgrade path per se, but to recover from a failed monitor node in a cluster where there is an upgrade in progress. The upgrade notes for Jewel to Kraken say you may upgrade OSDs Monitors and MSDs in any order. Perhaps I am reading too much into this, but I took it as I could proceed with the upgrade at my leisure. Making sure each node is successfully
upgraded before proceeding to the next node. The implication is that I can run the cluster with different version daemons (at least during the upgrade process). So that brings me to the problem at hand. What is the correct procedure for replacing a failed Monitor Node, especially if the failed Monitor is a mon_initial_member? Does it have to be the same version as the other Monitors in the cluster? I do have a public network statement in the ceph.conf file. The monitor r710e is listed as one of the mon_initial_members in ceph.conf with the correct IP address, but the error message is: Also “[r710e][WARNIN] monitor r710e does not exist in monmap” INFO ceph.conf cat /etc/ceph/ceph.conf monmap monmaptool: monmap file /tmp/monmap Status ceph -s PS. Tried this too ceph mon remove r710e From: David Turner [mailto:drakonstein@xxxxxxxxx] Question... Why are you reinstalling the node, removing the mon from the cluster, and adding it back into the cluster to upgrade to Kraken? The upgrade path from 10.2.5 to 11.2.0 is an acceptable upgrade path. If you just needed to reinstall
the OS for some reason, then you can keep the /var/lib/ceph/mon/r710e/ folder in tact and not need to remove/re-add the mon to reisntall the OS. Even if you upgraded from 14.04 to 16.04, this would work. You would want to change the upstart file in the daemon's
folder to systemd and make sure it works with systemctl just fine, but the daemon itself would be fine. If you are hell-bent on doing this the hardest way I've ever heard of, then you might want to check out this Note from the docs for adding/removing a mon. Since you are far enough removed from the initial ceph-deploy, you have removed
r710e from your configuration and if you don't have a public network statement in your ceph.conf file... that could be your problem for the probing. http://docs.ceph.com/docs/kraken/rados/deployment/ceph-deploy-mon/ "
Note When adding a monitor on a host that was not in hosts initially defined with the ceph-deploy new command,
a public network statement
needs to be added to the ceph.conf file." On Mon, Jun 19, 2017 at 1:09 PM Jim Forde <jimf@xxxxxxxxx> wrote:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com