Re: would people mind a slow osd restart during luminous upgrade?

David Turner <drakonstein@xxxxxxxxx> · Thu, 09 Feb 2017 17:12:09 +0000

When we upgraded to Jewel 10.2.3 from Hammer 0.94.7 in our QA cluster we had issues with client incompatibility.  We first 
tried upgrading our clients before upgrading the cluster.  This broke creating RBDs, cloning RBDs, and probably many other 
things.  We quickly called that test a wash and redeployed the cluster back to 0.94.7 and redid the upgrade
by partially upgrading the cluster, testing, fully upgrading the cluster, testing, and finally upgraded the clients to Jewel.  This 
worked with no issues creating RBDs, cloning, 
snapshots, deleting, etc.

I'm not sure if there was a previous reason that we decided to always upgrade the clients first.  It might have had to do with the upgrade from Firefly to Hammer.  It's just something we always test now, especially with full version upgrades.  That being said, making sure that there is a client that was regression tested throughout the cluster upgrade would be great to have in the release notes.

On Thu, Feb 9, 2017 at 7:29 AM Sage Weil <sweil@xxxxxxxxxx> wrote:
On Thu, 9 Feb 2017, David Turner wrote:

> The only issue I can think of is if there isn't a version of the clients

> fully tested to work with a partially upgraded cluster or a documented

> incompatibility requiring downtime. We've had upgrades where we had to

> upgrade clients first and others that we had to do the clients last due to

> issues with how the clients interacted with an older cluster, partially

> upgraded cluster, or newer cluster.

We maintain client compatibiltity across *many* releases and several

years.  In general this under the control of the administrator via their

choice of CRUSH tunables, which effectively let you choose the oldest

client you'd like to support.

I'm curious which upgrade you had problems with?  Generally speaking the

only "client" upgrade ordering issue is with the radosgw clients, which

need to be upgraded after the OSDs.

> If the FileStore is changing this much, I can imagine a Jewel client having

> a hard time locating the objects it needs from a Luminous cluster.

In this case the change would be internal to a single OSD and have no

effect on the client/osd interaction or placement of objects.

sage

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com