Re: Time to Upgrade from Nautilus

Tyler Stachecki <stachecki.tyler@xxxxxxxxx> · Sat, 14 Oct 2023 18:00:31 -0400

On Sat, Oct 14, 2023 at 5:52 PM Tyler Stachecki
<stachecki.tyler@xxxxxxxxx> wrote:
>
> On Sat, Oct 14, 2023 at 5:14 PM Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
> >
> > Hello.
> >
> > It's been a while.  For the past couple years I've had a cluster running
> > Nautilus on Debian 10 using the Debian Ceph packages, and deployed with
> > Ceph-Ansible.  It's not a huge cluster - 10 OSD nodes with 80 x 12TB HDD
> > OSDs, plus 3 management nodes, and about 40% full at the moment - but it is
> > a critical resource for one of our researchers.
> >
> > Back then I had some misgivings about non-Debian packages and also about
> > containerized Ceph.   I don't know if my feelings about these things have
> > changed that much, but it's time to upgrade, and, with the advent of
> > cephadm it looks like it's just better to stay mainstream.
>
> I know it's probably the last thing you want to hear while looking
> down the barrel of a gun for a research cluster running an outdated
> version of Ceph -- but Debian 10 is *old*. You may be at the end of
> the road on Nautilus until you upgrade.
>
> You can see this in that e.g., download.ceph.com provides buster
> builds for Nautilus, but not Pacific:
> https://download.ceph.com/debian-nautilus/dists/
> https://download.ceph.com/debian-pacific/dists/

Caught myself here - it's octopus that is the end of the road for
buster. ceph.com builds for Quincy require bullseye/Debian 11.

> Containerization would let you stay on Debian 10 by running Debian
> 11/12 containers on top of it. That being said, I'm sure you know that
> security updates ended 06/22 for Debian 10... not to mention the
> Debian 10 kernel is getting long in the tooth, and containizeration
> won't help you there.
>
> If you do proceed on with containerization, I would recommend
> containerizing your existing Nautilus cluster before trying to
> upgrade. Trying to do both container-ize an existing cluster while
> upgrading it is really just asking for trouble. Ceph upgrades without
> anything else moving around can be hard enough as it is!
>
> > So I'm looking for advice on how to get from where I'm at to at least
> > Pacific or Quincy.
>
> Be wary that you should not upgrade across more than 2 releases at a
> time per official documentation. If you're on Nautilus, you're looking
> at Pacific tops unless you want to gamble a bit:
> https://docs.ceph.com/en/latest/releases/quincy/#upgrading-from-pre-octopus-releases-like-nautilus
>
> >  I've read a little in the last couple days.  I've seen various opinions on
> > (not) skipping releases and on when to switch to cephadm.  I'm also
> > concerned about cleaning up those old Debian packages - will there be a
> > point where I can 'apt-get purge' them without harming the cluster.
> >
> > One particular thing:  The upgrade instructions in various places on
> > docs.ceph.com say something like
> >
> > Upgrade monitors by installing the new packages and restarting the monitor
> > daemons.
>
> So the diddly here is that you have to upgrade Ceph components in a
> precise order - all mons first, then all mgrs, etc. I imagine the
> disclaimer you're reading is oriented towards the fact that lots of
> folks run both mons and mgrs on the same host. If you apt-get
> dist-upgrade on one "mon/mgr" host, you would effectively have a mgr
> of release N+1 running prior to all mons being upgraded to N+1. The
> *best-case* scenario for this is that the mgr will not start anymore
> until the rest of the mons are upgraded.
>
> > To me this is kind of vague.  Perhaps there is a different concept fo
> > 'packages'  within the cephadm environment.  I could really use some
> > clarification on this.
> >
> > I'd also consider decommissioning a few nodes, setting up a new cluster on
> > fresh Debian installs. and migrating the data and remaining nodes.  This
> > would be a long and painful process - decommission a node, move it, move
> > some data, decommission another node - and I don't know what effect it
> > would have on external references to our object store.
>
> I think you may be overlooking something here -- there's no need to
> move data to do that. You just set noout, rebuild a host, and then run
> `ceph-volume lvm activate --all` after you're back up on Debian 11.
> That command will scan for LVMs for OSDs and just prop everything back
> up for you -- systemd units and all. There will be minimal recovery
> activity to restore degraded objects that had writes to them while the
> host was rebuilding, but it should be fairly minimal.
>
> > Please advise.
>
> I am more than willing to extend my expertise to my alma mater here --
> let me know if there's any way I can help. I have a heap of experience
> in upgrading Ubuntu/Ceph clusters with no downtime.
>
> > Thanks.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdhall@xxxxxxxxxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx