Re: Octopus: conversion from ceph-ansible to Cephadm causes unexpected 15.2.15→.13 downgrade for MDSs and RGWs

Sebastian Wagner <sewagner@xxxxxxxxxx> · Thu, 16 Dec 2021 14:46:21 +0100

Hi Florian, hi Guillaume

Am 16.12.21 um 14:18 schrieb Florian Haas:
> Hello everyone,
>
> my colleagues and I just ran into an interesting situation updating
> our Ceph training course. That course's labs cover deploying a
> Nautilus cluster with ceph-ansible, upgrading it to Octopus (also with
> ceph-ansible), and then converting it to Cephadm before proceeding
> with the upgrade to Pacific.

I'd go a different route actually.

 1. convert the nautilus cluster to be containerized
 2. upgrade the containerized cluster to Pacific using ceph-ansible
 3. run the adopt playbook.

Not because this path is better or worse, but because it's better tested.

Guillaume, should we recommend this somehow in the ceph-ansible docs?

Best,
Sebastian

>
> When freshly upgraded to Octopus with ceph-ansible, the entire cluster
> is at version 15.2.15. And everything that is then being adopted into
> Cephadm management (with "cephadm adopt --style legacy") gets
> containers running that release. So far, so good.
>
> When we've completed the adoption process for MGRs, MONs, and OSDs, we
> proceed to redeploying our MDSs and RGWs, using "ceph orch apply mds"
> and "ceph orch apply rgw". Here, what we end up with is a bunch of
> MDSs and RGWs running on 15.2.13. Since the cluster previously ran
> Ansible-deployed 15.2.15 MDSs and RGWs, that makes this a partial (and
> very unexpected) downgrade.
>
> The docs at https://docs.ceph.com/en/octopus/cephadm/adoption/ do
> state that we can use "cephadm --image <image>" to set the image. But
> we don't actually need that when we invoke cephadm directly ("cephadm
> adopt" does pull the correct image). Rather we'd need to set the
> correct image for deployment by "ceph orch apply", and there doesn't
> seem to be a straightforward way to do that.
>
> I suppose that this can be worked around in a couple of ways:
>
> * by following the documentation and then running "ceph orch upgrade
> start --ceph-version 15.2.15" immediately after;
> * by running "ceph orch daemon redeploy", which does support an
> --image parameter (but is per-daemon, thus less convenient than
> running through a rolling update).
>
> But I'd argue that none of those additional steps should actually be
> necessary — rather, "ceph orch apply" should just deploy the correct
> (latest) version without additional user involvement.
>
> The documentation seems to suggest another approach, namely to use an
> updated service spec, but unfortunately that won't work as we can't
> set "image" that way. Example for the rgw service:
>
> ---
> # rgw.yml
> service_type: rgw
> service_id: default.default
> placement:
>   count: 3
> image: "quay.io/ceph/ceph:v15"
> ports:
>   - 7480
>
> # ceph orch apply -i rgw.yaml
> Error EINVAL: ServiceSpec: __init__() got an unexpected keyword
> argument 'image'
Right, this won't work.
>
> So, we're curious what's the correct way to ensure that "ceph orch
> apply" installs the latest Octopus release for MDSs and RGWs being
> redeployed as part of a Cephadm cluster conversion. Or is this simply
> a bug somewhere in the orchestrator that would need fixing?
>
> Cheers,
> Florian
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx