On Wed, Sep 5, 2018 at 1:30 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: > On Tue, 4 Sep 2018, Vasu Kulkarni wrote: >> On Tue, Sep 4, 2018 at 12:31 PM, Sage Weil <sweil@xxxxxxxxxx> wrote: >> > There is no plan or expectation (on my part at least) to support >> > downgrades across major versions (mimic -> luminous). (IMO the complexity >> > that would need to be introduced to allow this is not worth the >> > investment.) >> > >> > However, there *is* an interest in testing (and being able to support) >> > downgrades between minor versions. For example, if you're running 13.2.2, >> > and start rolling out 13.2.3 but things misbehave, it would be nice to be >> > able to roll back to 13.2.2. >> >> I am personally -1 on downgrading the software for minor versions too, >> If for some reason say 13.2.3 is not working on a specific system, I >> think the ideal thing could be to stop rolling upgrade at that stage and >> revert the node back to original state with minimal impact (couple osd's >> in down state, until fresh install of 13.2.2 restores it) > > I don't think it is realistic to expect users to upgrade in a way that > lets them catch any issues with new versions before they go beyond a > single failure domain (e.g., one host or rack). Some issues might > not even manifest until they do. And even if the did do a single failure > domain and notice the issue, they would be stuck running with a degraded > system (or have to do a big rebalance) until a proper fix is available. for single failure domain, i actually meant the same downgrade here, but in different way, like uninstall and reinstall old version, restore mondb, bring osd back faster(like a os reinstall case) If problem can be detected only after all upgrades, then its a different case. > > FWIW we already get flak from customers because we don't test and support > this... I think this is a when, not an if. > >> I think adding more tests to cover upgrade scenarios rather than >> downgrade cases will be more helpful. > > We should do that too. Any feedback on what types of upgrade scenarios we > should cover that we currently don't would be helpful... I will look into current cases but mostly mix of filestore/bluestore/ec for rgw/rbd/fs workloads in continuous online mode. If some of them are outside upgrades suites, probably we can bring them in same suite. with downgrades in picture test cases will be 2x. > > Thanks! > sage > > >> >> > >> > So.. what do we need to do to allow this? >> > >> > 1. Create a test suite that captures the downgrade cases. We could start >> > with a teuthology facet for the initial version and have another facet for >> > the target version. Teuthology can't enforce a strict ordering (i.e., >> > always a downgrade) but it's probably just as valuable to also test the >> > upgrade cases too. The main challenge I see here is that we are regularly >> > fixing bugs in the stable releases; since the tests are against older >> > releases, problems we uncover are often things that we can't "fix" since >> > it's existing code. >> > >> > It will probably (?) be the case that in the end we have known issues with >> > downgrades with specifics versions. >> > >> > What kind of workloads should we run? >> > >> > Should we repurpose the p2p suite to do this? Right now it steps through >> > every stable release in sequence. Instead, we could add N facets that >> > upgrade (or downgrade) between versions, and make sure that N >= the total >> > number of point releases. This would be more like an upgrade/downgrade >> > "thrasher" in that case... >> > >> > 2. Consider downgrades when backporting any changes to stable releases. >> > If we are adding fields to data structures, they need to work both in the >> > upgrade case (which we're already good at) and the downgrade case. >> > Usually the types of changes that make some behavior change happen in the >> > major releases, but occasionally we make these changes in stable releases >> > too. >> > >> > I can't actually think of a stable branch change that would be problematic >> > right now... hopefully that's a good sign! >> > >> > Other thoughts? >> > sage >> > >> >>