Re: testing downgrades

Tamilarasi Muthamizhan <tmuthami@xxxxxxxxxx> · Wed, 5 Sep 2018 13:21:22 -0700

+1 to downgrade testing between minor versions.

It is definitely a "nice-to-have" feature that would allow the cluster
to remain functional by just downgrading to "N" version until we debug
and fix issue in the "N+1" version.

We might need a "downgrade" task in teuthology to exclusively take
care of issues that might be hit during downgrades [the last time i
tried downgrade which is couple years back or more, it was NOT
straight forward. downgrading ceph also means downgrading any other
dependent packages]

What kind of workloads should we run? we can choose to run a mix of
rados/rbd/rgw/fs workloads like we do with upgrades. should we also
consider doing rolling downgrades?  [Example: in a real world cluster,
if upgrading from N to N+1 poses a problem and the customer chooses to
downgrade, how would this impact the existing cluster and workloads? ]

Should we repurpose the p2p suite to do this?  Right now it steps through
every stable release in sequence.  Instead, we could add N facets that
upgrade (or downgrade) between versions, and make sure that N >= the total
number of point releases.  This would be more like an upgrade/downgrade
"thrasher" in that case...

- Great Idea!

2. Consider downgrades when backporting any changes to stable releases.
If we are adding fields to data structures, they need to work both in the
upgrade case (which we're already good at) and the downgrade case.
Usually the types of changes that make some behavior change happen in the
major releases, but occasionally we make these changes in stable releases
too.
 - having a standalone basic downgrade testing would help with
backport testing.

On Tue, Sep 4, 2018 at 12:31 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
>
> There is no plan or expectation (on my part at least) to support
> downgrades across major versions (mimic -> luminous).  (IMO the complexity
> that would need to be introduced to allow this is not worth the
> investment.)
>
> However, there *is* an interest in testing (and being able to support)
> downgrades between minor versions.  For example, if you're running 13.2.2,
> and start rolling out 13.2.3 but things misbehave, it would be nice to be
> able to roll back to 13.2.2.
>
> So.. what do we need to do to allow this?
>
> 1. Create a test suite that captures the downgrade cases.  We could start
> with a teuthology facet for the initial version and have another facet for
> the target version.  Teuthology can't enforce a strict ordering (i.e.,
> always a downgrade) but it's probably just as valuable to also test the
> upgrade cases too.  The main challenge I see here is that we are regularly
> fixing bugs in the stable releases; since the tests are against older
> releases, problems we uncover are often things that we can't "fix" since
> it's existing code.
>
> It will probably (?) be the case that in the end we have known issues with
> downgrades with specifics versions.
>
> What kind of workloads should we run?
>
> Should we repurpose the p2p suite to do this?  Right now it steps through
> every stable release in sequence.  Instead, we could add N facets that
> upgrade (or downgrade) between versions, and make sure that N >= the total
> number of point releases.  This would be more like an upgrade/downgrade
> "thrasher" in that case...
>
> 2. Consider downgrades when backporting any changes to stable releases.
> If we are adding fields to data structures, they need to work both in the
> upgrade case (which we're already good at) and the downgrade case.
> Usually the types of changes that make some behavior change happen in the
> major releases, but occasionally we make these changes in stable releases
> too.
>
> I can't actually think of a stable branch change that would be problematic
> right now... hopefully that's a good sign!
>
> Other thoughts?
> sage
>

--
Regards
Tamil