Re: upgrade procedure to Luminous

Sage Weil <sage@xxxxxxxxxxxx> · Fri, 14 Jul 2017 15:18:54 +0000 (UTC)

On Fri, 14 Jul 2017, Lars Marowsky-Bree wrote:
> On 2017-07-14T14:12:08, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> 
> > > Any thoughts on how to mitigate this, or on whether I got this all wrong and
> > > am missing a crucial detail that blows this wall of text away, please let me
> > > know.
> > I don't know; the requirement that mons be upgraded before OSDs doesn't 
> > seem that unreasonable to me.  That might be slightly more painful in a 
> > hyperconverged scenario (osds and mons on the same host), but it should 
> > just require some admin TLC (restart mon daemons instead of 
> > rebooting).
> 
> I think it's quite unreasonable, to be quite honest. Collocated MONs
> with OSDs is very typical for smaller cluster environments.

Yes, but how many of those clusters can only upgrade by updating the 
packages and rebooting?  Our documented procedures have always recommended 
upgrading the packages, then restarting either mons or osds first and to 
my recollection nobody has complained.  TBH my first encounter with the 
"reboot on upgrade" procedure in the Linux world was with Fedora (which I 
just recently switched to for my desktop)--and FWIW it felt very 
anachronistic.

But regardless, the real issue is this is a trade-off between the testing 
and software complexity burden vs user flexibility.  Enforcing an upgrade 
order means we have less to test and have greater confidence the user 
won't see something we haven't.  It also means, in this case, that we can 
rip out out a ton of legacy code in luminous without having to keep 
compatibility workarounds in place for another whole LTS cycle (a year!).  
That reduces code complexity, improves quality, and improves velocity.  
The downside is that the upgrade procedures has to be done in a particular 
order.

Honestly, though, I think it is a good idea for operators to be 
careful with their upgrades anyway.  They should upgrade just mons, let 
cluster stabilize, and make sure things are okay (e.g., no new 
health warnings saying they have to 'ceph osd set sortbitwise') before 
continuing.

Also, although I think it's a good idea to do the mon upgrade relatively 
quickly (one after the other until they are upgraded), the OSD upgrade can 
be stretched out longer.  (We do pretty thorough thrashing tests with 
mixed-version OSD clusters, but go through the mon upgrades pretty 
quickly.)

> > Is there something in some distros that *requires* a reboot in order to 
> > upgrade packages?
> 
> Not necessarily.
> 
> *But* once we've upgraded the packages, a failure or reboot might
> trigger this.

True, but this is rare, and even so the worst that can happen in this 
case is the OSDs don't come up until the other mons are upgrade.  If the 
admin plans to upgrade the mons in succession without lingering with 
mixed-versions mon the worst-case downtime window is very small--and only 
kicks in if *more than one* of the mon nodes fails (taking out OSDs in 
more than one failure domain).

> And customers don't always upgrade all nodes at once in a short period
> (the benefit of a supposed rolling upgrade cycle), increasing the risk.

I think they should plan to do this for the mons.  We can make a note 
stating as much in the upgrade procedure docs?

> I wish we'd already be fully containerized so indeed the MONs were truly
> independent of everything else going on on the cluster, but ...

Indeed!  Next time around...

> > Also, this only seems like it will affect users that are getting their 
> > ceph packages from the distro itself and not from a ceph.com channel or a 
> > special subscription/product channel (this is how the RHEL stuff works, I 
> > think).
> 
> Even there, upgrading only the MON daemons and not the OSDs is tricky?

I mean you would upgrade all of the packages, but only restart the mon 
daemons.  The deb packages have skipped the auto-restart in the postinst 
(or whatever) stage for years.  I'm pretty sure the rpms do the same?

Anyway, does that make sense?  Yes, it means that you can't just reboot in 
succession if your mons are mixed with OSDs.  But this time adding that 
restriction let us do the SnapSet and snapdir conversion in a single 
release, which is a *huge* win and will let us rip out a bunch of ugly OSD 
code.  We might not have a need for it next time around (and can try to 
avoid it), but I'm guessing something will come up and it will again be a 
hard call to make balancing between sloppy/easy upgrades vs simpler 
code...

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html