Re: Upgrade from Octopus to Pacific cannot get monitor to join

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 28 Jul 2022 08:03:41 -0700

On Wed, Jul 27, 2022 at 4:54 PM <kevin@xxxxxxxxxx> wrote:
>
> Currently, all of the nodes are running in docker. The only way to upgrade is to redeploy with docker (ceph orch daemon redeploy), which is essentially making a new monitor. Am I missing something?

Apparently. I don't have any experience with Docker, and unfortunately
very little with containers in general, so I'm not sure what process
you need to follow, though. cephadm certainly managers to do it
properly — you want to maintain the existing disk store.

How do you do it for OSDs? Surely you don't create throw away an old
OSD, create a new one, and wait for migration to complete before doing
the next...
-Greg

>
> Is there some prep work I could/should be doing?
>
> I want to do a staggered upgrade as noted here (https://docs.ceph.com/en/pacific/cephadm/upgrade/). That says for a staggered upgrade the order is mgr -> mon, etc. But that was not working for me because it said the --daemon-types was not supported.
>
> Basically I'm confused on what is the 'proper' way to upgrade then. There isn't any way that I see to upgrade the 'code' they are running because it's all in docker containers. But maybe I'm missing something obvious
>
> Thanks
>
>
>
>
> July 27, 2022 4:34 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:
>
> On Wed, Jul 27, 2022 at 10:24 AM <kevin@xxxxxxxxxx> wrote:
>
> Currently running Octopus 15.2.16, trying to upgrade to Pacific using cephadm.
>
> 3 mon nodes running 15.2.16
> 2 mgr nodes running 16.2.9
> 15 OSD's running 15.2.16
>
> The mon/mgr nodes are running in lxc containers on Ubuntu running docker from the docker repo (not the Ubuntu repo). Using cephadm to remove one of the monitor nodes, and then re-add it back with a 16.2.9 version. The monitor node runs but never joins the cluster. Also, this causes the other 2 mon nodes to start flapping. Also tried adding 2 mon nodes (for a total of 5 mons) on bare metal running Ubuntu (with docker running from the docker repo) and the mon's won't join and won't even show up in 'ceph status'
>
> The way you’re phrasing this it sounds like you’re removing existing monitors and adding newly-created ones. That won’t work across major version boundaries like this (at least, without a bit of prep work you aren’t doing) because of how monitors bootstrap themselves and their cluster membership. You need to upgrade the code running on the existing monitors instead, which is the documented upgrade process AFAIK.
> -Greg
>
>
>
> Can't find anything in the logs regarding why it's failing. The docker container starts and seems to try to join the cluster but just sits and doesn't join. The other two start flapping and then eventually I have to stop the new mon. I can add the monitor back by changing the container_image to 15.2.16 and it will re-join the cluster as expected.
>
> The cluster was previously running nautilus installed using ceph-deploy
>
> Tried setting 'mon_mds_skip_sanity true' from reading another post but it doesn't appear to help.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx