Re: Jewel -> Luminous upgrade, package install stopped all daemons

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 15 Sep 2017 22:49:33 +0000

On Fri, Sep 15, 2017 at 3:34 PM David Turner <drakonstein@xxxxxxxxx> wrote:
I don't understand a single use case where I want updating my packages using yum, apt, etc to restart a ceph daemon.  ESPECIALLY when there are so many clusters out there with multiple types of daemons running on the same server.
My home setup is 3 nodes each running 3 OSDs, a MON, and an MDS server.  If upgrading the packages restarts all of those daemons at once, then I'm mixing MON versions, OSD versions and MDS versions every time I upgrade my cluster.  It removes my ability to methodically upgrade my MONs, OSDs, and then clients.

Now let's take the Luminous upgrade which REQUIRES you to upgrade all of your MONs before anything else... I'm screwed.  I literally can't perform the upgrade if it's going to restart all of my daemons because it is impossible for me to achieve a paxos quorum of MONs running the Luminous binaries BEFORE I upgrade any other daemon in the cluster.  The only way to achieve that is to stop the entire cluster and every daemon, upgrade all of the packages, then start the mons, then start the rest of the cluster again... There is no way that is a desired behavior.

All of this is ignoring large clusters using something like Puppet to manage their package versions.  I want to just be able to update the ceph version and push that out to the cluster.  It will install the new packages to the entire cluster and then my automated scripts can perform a rolling restart of the cluster upgrading all of the daemons while ensuring that the cluster is healthy every step of the way.  I don't want to add in the time of installing the packages on every node DURING the upgrade.  I want that done before I initiate my script to be in a mixed version state as little as possible.

Claiming that having anything other than an issued command to specifically restart a Ceph daemon is anything but a bug and undesirable sounds crazy to me.  I don't ever want anything restarting my Ceph daemons that is not explicitly called to do so.  That just sounds like it's begging to put my entire cluster into a world of hurt by accidentally restarting too many daemons at the same time making the data in my cluster inaccessible.

I'm used to the Ubuntu side of things.  I've never seen upgrading the Ceph packages to ever affect a daemon before.  If that's actually a thing that is done on purpose in RHEL and CentOS... good riddance! That's ridiculous!

I don't know what the settings are right now, or what the latest argument was to get them there.

But we *have* had distributions require us to make changes to come into compliance with their packaging policies.
Some users *do* want their daemons to automatically reboot on upgrade, because if you have segregated nodes that you're managing by hand, it's a lot easier to issue one command than two.
And on and on and on.

Personally, I tend closer to your position. But this is a thing that some people get very vocal about; we don't have a lot of upstream people interested in maintaining packaging or fighting with other interest groups who say we're doing it wrong; and it's just not a lot of fun to deal with.

Looking through the git logs, I think CEPH_AUTO_RESTART_ON_UPGRADE was probably added so distros could easily make that distinction. And it would not surprise me if the use of selinux required restarts — upgrading packages tends to change what the daemon's selinux policy allows it to do, and if they have different behavior I presume selinux is going to complain wildly...
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com