+1 to downgrade testing between minor versions. It is definitely a "nice-to-have" feature that would allow the cluster to remain functional by just downgrading to "N" version until we debug and fix issue in the "N+1" version. We might need a "downgrade" task in teuthology to exclusively take care of issues that might be hit during downgrades [the last time i tried downgrade which is couple years back or more, it was NOT straight forward. downgrading ceph also means downgrading any other dependent packages] What kind of workloads should we run? we can choose to run a mix of rados/rbd/rgw/fs workloads like we do with upgrades. should we also consider doing rolling downgrades? [Example: in a real world cluster, if upgrading from N to N+1 poses a problem and the customer chooses to downgrade, how would this impact the existing cluster and workloads? ] Should we repurpose the p2p suite to do this? Right now it steps through every stable release in sequence. Instead, we could add N facets that upgrade (or downgrade) between versions, and make sure that N >= the total number of point releases. This would be more like an upgrade/downgrade "thrasher" in that case... - Great Idea! 2. Consider downgrades when backporting any changes to stable releases. If we are adding fields to data structures, they need to work both in the upgrade case (which we're already good at) and the downgrade case. Usually the types of changes that make some behavior change happen in the major releases, but occasionally we make these changes in stable releases too. - having a standalone basic downgrade testing would help with backport testing. On Tue, Sep 4, 2018 at 12:31 PM Sage Weil <sweil@xxxxxxxxxx> wrote: > > There is no plan or expectation (on my part at least) to support > downgrades across major versions (mimic -> luminous). (IMO the complexity > that would need to be introduced to allow this is not worth the > investment.) > > However, there *is* an interest in testing (and being able to support) > downgrades between minor versions. For example, if you're running 13.2.2, > and start rolling out 13.2.3 but things misbehave, it would be nice to be > able to roll back to 13.2.2. > > So.. what do we need to do to allow this? > > 1. Create a test suite that captures the downgrade cases. We could start > with a teuthology facet for the initial version and have another facet for > the target version. Teuthology can't enforce a strict ordering (i.e., > always a downgrade) but it's probably just as valuable to also test the > upgrade cases too. The main challenge I see here is that we are regularly > fixing bugs in the stable releases; since the tests are against older > releases, problems we uncover are often things that we can't "fix" since > it's existing code. > > It will probably (?) be the case that in the end we have known issues with > downgrades with specifics versions. > > What kind of workloads should we run? > > Should we repurpose the p2p suite to do this? Right now it steps through > every stable release in sequence. Instead, we could add N facets that > upgrade (or downgrade) between versions, and make sure that N >= the total > number of point releases. This would be more like an upgrade/downgrade > "thrasher" in that case... > > 2. Consider downgrades when backporting any changes to stable releases. > If we are adding fields to data structures, they need to work both in the > upgrade case (which we're already good at) and the downgrade case. > Usually the types of changes that make some behavior change happen in the > major releases, but occasionally we make these changes in stable releases > too. > > I can't actually think of a stable branch change that would be problematic > right now... hopefully that's a good sign! > > Other thoughts? > sage > -- Regards Tamil