Re: No rolling updates from v0.56 to v0.60+?

Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> · Thu, 18 Apr 2013 18:40:10 +0100

On 04/18/2013 05:28 PM, Gregory Farnum wrote:
On Wed, Apr 17, 2013 at 7:40 AM, Guido Winkelmann
<guido@xxxxxxxxxxxxxxxxx> wrote:
Hi,

I just tried upgrading parts of our experimental ceph cluster from 0.56.1 to
0.60, and it looks like the new mon-daemon from 0.60 cannot talk to those from
0.56.1 at all.

Long story short, we had to move some hardware around and during that time I
had to shrink the cluster to one single machine. My plan was to expand it to
three machines again, so that I would again have 3 mons and 3 osds, as before.
I just installed the first new machine, going straight for 0.60, but leaving
the remaining old one at 0.56.1. I added the new mon to the mon map according
to the documentation and started the new mon daemon, but the mon-cluster
wouldn't achieve quorum. In the logs for the new mon, I saw the following line
repeated a lot:

0 -- 10.6.224.129:6789/0 >> 10.6.224.131:6789/0 pipe(0x2da5ec0 sd=20 :37863
s=1 pgs=0 cs=0 l=0).connect protocol version mismatch, my 10 != 9

The old mon had no such lines in its log.

I could only solve this by shutting down the old mon and upgrading it to 0.60
as well.

It looks to me like this means rolling upgrades without downtime won't be
possible from bobtail to cuttlefish. Is that correct?

If the cluster is in good shape, this shouldn't actually result in
downtime. Do a rolling upgrade of your monitors, and then when a
majority of them are on Cuttlefish they'll switch over to form the
quorum — the "downtime" being the period a store requires to update,
which shouldn't be long, and it will only be the monitors that are
inaccessible (unless it takes a truly ridiculous time for the
upgrade). All the rest of the daemons you can do rolling upgrades on
just the same as before.

Another potential source of delay would be the synchronization process 
triggered when a majority of monitors have been upgraded.

Say you have 5 monitors.

You upgrade two while the cluster is happily running: the stores are 
converted, which may take longer if the store is huge [1], but you get 
your monitors ready to join the quorum as soon as a third member is 
upgraded.

During this time, your cluster kept on going, with more versions being 
created.

And then you decide to upgrade the third monitor.  It will go through 
the same period of downtime as the other two monitors -- which as Greg 
said shouldn't be long, but may be if your stores are huge [1] -- and 
this will be the bulk of your downtime.

However, as the cluster kept on going, there's a chance that the first 
two monitors to be upgraded will have fallen out of sync with the more 
recent cluster state.  That will trigger a store sync, which shouldn't 
take long either, but this is somewhat bound by the store size and the 
amount of versions that were created in-between.  You might even be 
lucky enough, and go through with the whole thing in no time and the 
sync might not even be necessary (there's another mechanism to handle 
catch-up when the monitors haven't drifted that much).

Anyway, when you are finally upgrading the third monitor (out of 5), 
that is going to break quorum, so it would probably be wise to just 
upgrade the remaining monitors all at once.

[1] - With the new leveldb tuning this might not even be an issue.

  -Joao

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com