Re: testing downgrades

Yuri Weinstein <yweinste@xxxxxxxxxx> · Wed, 5 Sep 2018 13:37:56 -0700

I think it's a good idea that we have being talking about for awhile!

What if we reconsider the term "downgrade" into "dry-run".

I'd think when real world users upgrade a cluster they first worry
that if something goes wrong how to restore the cluster back into
previous condition.

If we create a new feature "dry-run" it would limit the testing
between current version N and new version N+1, but won't go any
further.

Thx
YuriW

On Tue, Sep 4, 2018 at 12:31 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> There is no plan or expectation (on my part at least) to support
> downgrades across major versions (mimic -> luminous).  (IMO the complexity
> that would need to be introduced to allow this is not worth the
> investment.)
>
> However, there *is* an interest in testing (and being able to support)
> downgrades between minor versions.  For example, if you're running 13.2.2,
> and start rolling out 13.2.3 but things misbehave, it would be nice to be
> able to roll back to 13.2.2.
>
> So.. what do we need to do to allow this?
>
> 1. Create a test suite that captures the downgrade cases.  We could start
> with a teuthology facet for the initial version and have another facet for
> the target version.  Teuthology can't enforce a strict ordering (i.e.,
> always a downgrade) but it's probably just as valuable to also test the
> upgrade cases too.  The main challenge I see here is that we are regularly
> fixing bugs in the stable releases; since the tests are against older
> releases, problems we uncover are often things that we can't "fix" since
> it's existing code.
>
> It will probably (?) be the case that in the end we have known issues with
> downgrades with specifics versions.
>
> What kind of workloads should we run?
>
> Should we repurpose the p2p suite to do this?  Right now it steps through
> every stable release in sequence.  Instead, we could add N facets that
> upgrade (or downgrade) between versions, and make sure that N >= the total
> number of point releases.  This would be more like an upgrade/downgrade
> "thrasher" in that case...
>
> 2. Consider downgrades when backporting any changes to stable releases.
> If we are adding fields to data structures, they need to work both in the
> upgrade case (which we're already good at) and the downgrade case.
> Usually the types of changes that make some behavior change happen in the
> major releases, but occasionally we make these changes in stable releases
> too.
>
> I can't actually think of a stable branch change that would be problematic
> right now... hopefully that's a good sign!
>
> Other thoughts?
> sage
>