Re: Upgrade from Octopus to Pacific cannot get monitor to join

kevin@xxxxxxxxxx · Wed, 27 Jul 2022 19:35:14 +0000

Downgraded the mgr nodes to 15.2.16

attempted to add a mon with 16.2.9

Jul 27 19:21:08 ceph-mon2 bash[568]: debug 2022-07-27T19:21:08.878+0000 7f8d28788700 1 mon.ceph-mon2@0(electing) e70 adding peer [v2:x.x.x.236:3300/0,v1:x.x.x.x:6789/0] to list of hints
Jul 27 19:21:08 ceph-mon2 bash[568]: message repeated 39 times: [ debug 2022-07-27T19:21:08.878+0000 7f8d28788700 1 mon.ceph-mon2@0(electing) e70 adding peer [v2:10.1.1.236:3300/0,v1:10.1.1.236:6789/0] to list of hints]

These messages just keep spewing in the logs and there is a bunch of mon elections but the new 16.2.9 mon never shows up

ceph versions:

 "mon": {
 "ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)": 2
 },
 "mgr": {
 "ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)": 2

July 27, 2022 10:53 AM, "Tyler Stachecki" <stachecki.tyler@xxxxxxxxx (mailto:stachecki.tyler@xxxxxxxxx?to=%22Tyler%20Stachecki%22%20<stachecki.tyler@xxxxxxxxx>)> wrote:
You're supposed to upgrade the mons first...https://docs.ceph.com/en/quincy/releases/pacific/#upgrading-non-cephadm-clusters (https://docs.ceph.com/en/quincy/releases/pacific/#upgrading-non-cephadm-clusters)Maybe try downgrading the mgrs back to Octopus? That's a bit of a scary situation.Tyler 
On Wed, Jul 27, 2022, 1:24 PM <kevin@xxxxxxxxxx (mailto:kevin@xxxxxxxxxx)> wrote: Currently running Octopus 15.2.16, trying to upgrade to Pacific using cephadm.

3 mon nodes running 15.2.16
2 mgr nodes running 16.2.9
15 OSD's running 15.2.16

The mon/mgr nodes are running in lxc containers on Ubuntu running docker from the docker repo (not the Ubuntu repo). Using cephadm to remove one of the monitor nodes, and then re-add it back with a 16.2.9 version. The monitor node runs but never joins the cluster. Also, this causes the other 2 mon nodes to start flapping. Also tried adding 2 mon nodes (for a total of 5 mons) on bare metal running Ubuntu (with docker running from the docker repo) and the mon's won't join and won't even show up in 'ceph status'

Can't find anything in the logs regarding why it's failing. The docker container starts and seems to try to join the cluster but just sits and doesn't join. The other two start flapping and then eventually I have to stop the new mon. I can add the monitor back by changing the container_image to 15.2.16 and it will re-join the cluster as expected.

The cluster was previously running nautilus installed using ceph-deploy

Tried setting 'mon_mds_skip_sanity true' from reading another post but it doesn't appear to help.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx (mailto:ceph-users@xxxxxxx)
To unsubscribe send an email to ceph-users-leave@xxxxxxx (mailto:ceph-users-leave@xxxxxxx)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx