On Tue, Sep 4, 2018 at 3:41 PM, Vasu Kulkarni <vakulkar@xxxxxxxxxx> wrote: > On Tue, Sep 4, 2018 at 12:31 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: >> There is no plan or expectation (on my part at least) to support >> downgrades across major versions (mimic -> luminous). (IMO the complexity >> that would need to be introduced to allow this is not worth the >> investment.) >> >> However, there *is* an interest in testing (and being able to support) >> downgrades between minor versions. For example, if you're running 13.2.2, >> and start rolling out 13.2.3 but things misbehave, it would be nice to be >> able to roll back to 13.2.2. > > I am personally -1 on downgrading the software for minor versions too, > If for some > reason say 13.2.3 is not working on a specific system, I think the > ideal thing could be to > stop rolling upgrade at that stage and revert the node back to > original state with minimal impact > (couple osd's in down state, until fresh install of 13.2.2 restores it) That is assuming you are able to detect a problem while upgrading. What if you actually need to start using the cluster to detect a problem that would benefit from a downgrade? > > I think adding more tests to cover upgrade scenarios rather than > downgrade cases will be > more helpful. That sounds like a great idea always > >> >> So.. what do we need to do to allow this? >> >> 1. Create a test suite that captures the downgrade cases. We could start >> with a teuthology facet for the initial version and have another facet for >> the target version. Teuthology can't enforce a strict ordering (i.e., >> always a downgrade) but it's probably just as valuable to also test the >> upgrade cases too. The main challenge I see here is that we are regularly >> fixing bugs in the stable releases; since the tests are against older >> releases, problems we uncover are often things that we can't "fix" since >> it's existing code. >> >> It will probably (?) be the case that in the end we have known issues with >> downgrades with specifics versions. >> >> What kind of workloads should we run? >> >> Should we repurpose the p2p suite to do this? Right now it steps through >> every stable release in sequence. Instead, we could add N facets that >> upgrade (or downgrade) between versions, and make sure that N >= the total >> number of point releases. This would be more like an upgrade/downgrade >> "thrasher" in that case... >> >> 2. Consider downgrades when backporting any changes to stable releases. >> If we are adding fields to data structures, they need to work both in the >> upgrade case (which we're already good at) and the downgrade case. >> Usually the types of changes that make some behavior change happen in the >> major releases, but occasionally we make these changes in stable releases >> too. >> >> I can't actually think of a stable branch change that would be problematic >> right now... hopefully that's a good sign! >> >> Other thoughts? >> sage >>