Re: Time to Upgrade from Nautilus

Tyler Stachecki <stachecki.tyler@xxxxxxxxx> · Sat, 14 Oct 2023 17:52:08 -0400

On Sat, Oct 14, 2023 at 5:14 PM Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
>
> Hello.
>
> It's been a while.  For the past couple years I've had a cluster running
> Nautilus on Debian 10 using the Debian Ceph packages, and deployed with
> Ceph-Ansible.  It's not a huge cluster - 10 OSD nodes with 80 x 12TB HDD
> OSDs, plus 3 management nodes, and about 40% full at the moment - but it is
> a critical resource for one of our researchers.
>
> Back then I had some misgivings about non-Debian packages and also about
> containerized Ceph.   I don't know if my feelings about these things have
> changed that much, but it's time to upgrade, and, with the advent of
> cephadm it looks like it's just better to stay mainstream.

I know it's probably the last thing you want to hear while looking
down the barrel of a gun for a research cluster running an outdated
version of Ceph -- but Debian 10 is *old*. You may be at the end of
the road on Nautilus until you upgrade.

You can see this in that e.g., download.ceph.com provides buster
builds for Nautilus, but not Pacific:
https://download.ceph.com/debian-nautilus/dists/
https://download.ceph.com/debian-pacific/dists/

Containerization would let you stay on Debian 10 by running Debian
11/12 containers on top of it. That being said, I'm sure you know that
security updates ended 06/22 for Debian 10... not to mention the
Debian 10 kernel is getting long in the tooth, and containizeration
won't help you there.

If you do proceed on with containerization, I would recommend
containerizing your existing Nautilus cluster before trying to
upgrade. Trying to do both container-ize an existing cluster while
upgrading it is really just asking for trouble. Ceph upgrades without
anything else moving around can be hard enough as it is!

> So I'm looking for advice on how to get from where I'm at to at least
> Pacific or Quincy.

Be wary that you should not upgrade across more than 2 releases at a
time per official documentation. If you're on Nautilus, you're looking
at Pacific tops unless you want to gamble a bit:
https://docs.ceph.com/en/latest/releases/quincy/#upgrading-from-pre-octopus-releases-like-nautilus

>  I've read a little in the last couple days.  I've seen various opinions on
> (not) skipping releases and on when to switch to cephadm.  I'm also
> concerned about cleaning up those old Debian packages - will there be a
> point where I can 'apt-get purge' them without harming the cluster.
>
> One particular thing:  The upgrade instructions in various places on
> docs.ceph.com say something like
>
> Upgrade monitors by installing the new packages and restarting the monitor
> daemons.

So the diddly here is that you have to upgrade Ceph components in a
precise order - all mons first, then all mgrs, etc. I imagine the
disclaimer you're reading is oriented towards the fact that lots of
folks run both mons and mgrs on the same host. If you apt-get
dist-upgrade on one "mon/mgr" host, you would effectively have a mgr
of release N+1 running prior to all mons being upgraded to N+1. The
*best-case* scenario for this is that the mgr will not start anymore
until the rest of the mons are upgraded.

> To me this is kind of vague.  Perhaps there is a different concept fo
> 'packages'  within the cephadm environment.  I could really use some
> clarification on this.
>
> I'd also consider decommissioning a few nodes, setting up a new cluster on
> fresh Debian installs. and migrating the data and remaining nodes.  This
> would be a long and painful process - decommission a node, move it, move
> some data, decommission another node - and I don't know what effect it
> would have on external references to our object store.

I think you may be overlooking something here -- there's no need to
move data to do that. You just set noout, rebuild a host, and then run
`ceph-volume lvm activate --all` after you're back up on Debian 11.
That command will scan for LVMs for OSDs and just prop everything back
up for you -- systemd units and all. There will be minimal recovery
activity to restore degraded objects that had writes to them while the
host was rebuilding, but it should be fairly minimal.

> Please advise.

I am more than willing to extend my expertise to my alma mater here --
let me know if there's any way I can help. I have a heap of experience
in upgrading Ubuntu/Ceph clusters with no downtime.

> Thanks.
>
> -Dave
>
> --
> Dave Hall
> Binghamton University
> kdhall@xxxxxxxxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx