[re-adding ML, so others may benefit] On Tue, 7 Mar 2017 13:14:14 -0700 Mike Lovell wrote: > On Mon, Mar 6, 2017 at 8:18 PM, Christian Balzer <chibi at gol.com> wrote: > > > On Mon, 6 Mar 2017 19:57:11 -0700 Mike Lovell wrote: > > > > > has anyone on the list done an upgrade from hammer (something later than > > > 0.94.6) to jewel with a cache tier configured? i tried doing one last > > week > > > and had a hiccup with it. i'm curious if others have been able to > > > successfully do the upgrade and, if so, did they take any extra steps > > > related to the cache tier? > > > > > It would be extremely helpful for everybody involved if you could be bit > > more specific than "hiccup". > > > > the problem we had was osds in the cache tier were crashing and it made the > cluster unusable for a while. http://tracker.ceph.com/issues/19185 is a > tracker issue i made for it. i'm guessing not many others have seen the > same issue. i'm just wondering if others have successfully done an upgrade > with an active cache tier and how things went. > Yeah, I saw that a bit later, looks like you found/hit a genuine bug. > I've upgraded one crappy test cluster from hammer to jewel w/o issues and > > am about to do that on a more realistic, busier test cluster as well. > > I did upgrade the other test cluster, that had actual traffic (to/through the cache) going on during the upgrade without any issues. Maybe Kefu Chai can comment on why this is not something seen by everyone, one thing I can think of is that I didn't change any defaults, in particular "hit_set_period". > > OTOH, I have no plans to upgrade my production Hammer cluster with a cache > > tier at this point. > > > > interesting. do you not have plans just because you are still testing? or > is there just no desire or need to upgrade? > All of the above. That (small) cluster is serving 9 compute nodes and that whole installation has reached its max build-out, it will NOT grow any further. Hammer is working fine, nobody involved is interesting in upgrading things willy-nilly (which would involve the compute nodes at some point as well) for a service that needs to be as close to 24/7 as possible. While I would like to eventually replace old HW consecutively about 3-4 years down the line and thus require "current" SW, migrating everything off that installation and starting fresh is also an option. If you do an upgrade of a compute node, you can live migrate things away from it first and if it doesn't pan out, no harm done. If you run into a "hiccup" with a Ceph upgrade (especially one that doesn't manifest itself immediately on the first MON/OSD being upgraded), your whole installation with (in my case) hundreds of VMs is dead in the water, given the exact circumstances for a prolonged period. Not a particular sunny or career enhancing prospect. Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/