We went through this exercise, though our starting point was ubuntu 16.04 / nautilus. We reduced our double builds as follows: 1. Rebuild each monitor host on 18.04/bionic and rejoin still on nautilus 2. Upgrade all mons, mgrs., (and rgws optionally) to pacific 3. Convert each mon, mgr, rgw to cephadm and enable orchestrator 4. Rebuild each mon, mgr, rgw on 20.04/focal and rejoin pacfic cluster 5. Drain and rebuild each osd host on focal and pacific This has the advantage of only having to drain and rebuild the OSD hosts once. Double building the control cluster hosts isn’t so bad, and orchestrator makes all of the ceph parts easy once it’s enabled. The biggest challenge we ran into was: https://tracker.ceph.com/issues/51652 because we still had a lot of filestore osds. It’s frustrating, but we managed to get through it without much client interruption on a dozen prod clusters, most of which were 38 osd hosts and 912 total osds each. One thing which helped, was, before beginning the osd host builds, set all of the old osds primary-affinity to something <1. This way when the new pacific (or octopus) osds join the cluster they will automatically be favored for primary on their pgs. If a heartbeat timeout storm starts to get out of control, start by setting nodown and noout. The flapping osds are the worst. Then figure out which osds are the culprit and restart them. Hopefully your nautilus osds are all bluestore and you won’t have this problem. We put up with it, because the filestore to bluestore conversion was one of the most important parts of this upgrade for us. Best of luck, whatever route you take. Regards, Josh Beaman From: Götz Reinicke <goetz.reinicke@xxxxxxxxxxxxxxx> Date: Tuesday, August 1, 2023 at 1:01 PM To: ceph-users@xxxxxxx <ceph-users@xxxxxxx> Subject: [EXTERNAL] Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints? Hi, As I’v read and thought a lot about the migration as this is a bigger project, I was wondering if anyone has done that already and might share some notes or playbooks, because in all readings there where some parts missing or miss understandable to me. I do have some different approaches in mind, so may be you have some suggestions or hints. a) upgrade nautilus on centos 7 with the few missing features like dashboard and prometheus. After that migrate one node after an other to ubuntu 20.04 with octopus and than upgrade ceph to the recent stable version. b) migrate one node after an other to ubuntu 18.04 with nautilus and then upgrade to octupus and after that to ubuntu 20.04. or c) upgrade one node after an other to ubuntu 20.04 with octopus and join it to the cluster until all nodes are upgraded. For test I tried c) with a mon node, but adding that to the cluster fails with some failed state, still probing for the other mons. (I dont have the right log at hand right now.) So my questions are: a) What would be the best (most stable) migration path and b) is it in general possible to add a new octopus mon (not upgraded one) to a nautilus cluster, where the other mons are still on nautilus? I hope my thoughts and questions are understandable :) Thanks for any hint and suggestion. Best . Götz _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx