Experimental upgrade of a Cephadm-managed Squid cluster to Ubuntu Noble (walk-through and RFC)

Florian Haas <florian.haas@xxxxxxxxxx> · Tue, 17 Dec 2024 20:10:24 +0100

Hi everyone,

as part of keeping our Ceph course updated, we recently went through the 
*experimental* process of upgrading a Cephadm-managed cluster from 
Ubuntu Jammy to Noble. Note that at this point there are no 
community-built Ceph packages that are available for Noble, though there 
*are* packages available directly from the Ubuntu mirrors. We figured 
that this shouldn't make too big of a difference, considering the Ceph 
deployment is all containerised anyway. We did eventually manage to 
successfully upgrade the cluster, with minimal disruption.

That said, DO NOT consider the rest of this message to be an instruction 
for how to do this in production. Rather, it's meant to highlight some 
quirks that — I think — still need ironing out. Again: for the time 
being, DO NOT RUN THIS IN PRODUCTION, please and thank you. :)

What's included here is a walkthrough of the process, and I would ask 
for feedback on this walkthrough. Specifically, if someone notices an 
error or omission in this process, please point them out.

Each one of these steps needs to be repeated for every node in the 
cluster. It assumes a working Cephadm-managed cluster on Ubuntu Jammy 
with Podman, running Ceph Squid. All commands should be run as root.

==
0. If the host you're about to upgrade is a MON, ensure that your 
orchestrator placement policy allows for the mon service to move to a 
different host. (This is to ensure that throughout your OS upgrade, you 
always have 3 MONs available.)

1. Disable orchestrator scheduling for the affected node: "ceph orch 
host label add <host> _no_schedule".

2. Wait for services (except OSDs) to be redeployed.

3. Check with "ceph orch host ok-to-stop <host>" to see whether the host 
reports "presumed safe to stop". (May require failing a mgr instance.)

4. Set the noout flag for that host: "ceph osd set-group noout <host>".

5. Check with "ceph health detail" to see if it says "host <host> has 
flags noout".

6. Shell into the host.

7. Make sure the host is fully updated on Jammy: "apt update && apt 
upgrade -y". Reboot the host if necessary.

8. Stop all remaining Ceph services (at this point, this should only be 
OSDs) with "systemctl stop ceph.target".

9. If your host has the cephadm package installed from a Ceph community 
repo, manually create the "/var/lib/cephadm" directory. This is required 
by the Ubuntu cephadm package, which during the OS upgrade will be 
cross-upgraded from a different package source. Failure to create this 
directory will cause this package upgrade to fail. I presume that this 
issue will go away whenever the Ceph community repos start including 
packages for Noble.

10. Run "do-release-upgrade" confirming all steps *except* the automated 
reboot at the end (opt out of that).

11. Remove the MongoDB_Compass profile from the /etc/apparmor.d 
directory. (If left in place, it will cause CEPHADM_REFRESH_FAILED errors.)

12. Remove any legacy ceph-osd packages, if installed on the host: "apt 
remove ceph-osd". (Failure to do so will keep your OSD containers from 
starting.)

13. Reboot the host.

14. Re-enable orchestrator scheduling with "ceph orch host label rm 
<host> _no_schedule".

15. Unset the noout flag for the host: "ceph osd unset-group noout <host>".

16. Check overall cluster health with "ceph -s" and "ceph health detail" 
before proceeding to the next node.
==

Once carried out for each Cephadm-managed Ubuntu node, you should have a 
reasonable expection of having a working Ceph/Noble test cluster.

Again: do not run this in production, however, feedback on the above 
steps is much appreciated. Please let us know your thoughts.

Cheers,
Florian

Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx