Re: 10.2.11 Handover to QE

Nathan Cutler <ncutler@xxxxxxx> · Wed, 20 Jun 2018 23:33:18 +0200

(I'm still on bereavement leave, but here goes. . . .)

I think there's an expectation that releases are thoroughly tested. That 
means our *real* "best effort" is made, by carefully preparing the 
release, getting leads to sign off, and putting it through QE, to ensure 
that the release is free of regressions.

It is of course possible to cut "releases" often, even frivolously for 
every single backport PR that gets merged, but a price is paid for that: 
instead of *us* doing the regression testing *before* the release, the 
*users* do the regression testing *after* the release!

Alfredo is right, though, that the longer the time interval between 
point releases, the more pressure to get "this" or "that" fix in because 
everyone assumes that it will be a long time until the next release. 
This tends to exacerbate the delays because regression testing either 
doesn't start, or has to be restarted after "this" or "that" PR is merged.

The whole "when to release" problem is a political one, and any solution 
is necessarily going to be political as well.

The current status quo is clear: we release when we're good and ready. 
We try hard to avoid regressions. (That doesn't prevent us from engaging 
in some hand-wringing and soul-searching every time a release gets 
delayed, of course.)

What I would rather talk about is something I will call the "window of 
danger". The inability to cut releases from SHA1 continually threatens 
to thwart our careful efforts to prevent regressions, because it creates 
this window (from beginning of integration testing until the official 
RPMs/DEBs are built) during which regression-introducing PRs can get 
merged to the release branch and become part of the official release 
packages.

While I still firmly believe that implementing the missing 
release-from-SHA1 feature is what is needed to eliminate this window of 
danger once and for all, we don't have that and, apparently, will not 
for some time.

During the last incarnation of this thread, Sage proposed two 
alternatives, which I will call "Large number of reviewers" and "Special 
release branch", respectively.

"Large number of reviewers" - at the beginning of the window of danger 
the release manager would increase the number of reviews needed to merge 
PRs to the release branch to some large number, like 100, and keep it 
there until the release is safely out. (While this measure is simple to 
implement, it's still subject to the human factor. To be successful it 
would require vigilance and awareness-raising communications.)

"Special release branch" - all releases would be cut from a special 
branch, using a workflow something like this:

1. at beginning of integration testing, branch luminous-next from luminous
2. build packages from luminous-next and regression test
3. if regression tests pass:
    a. sign and release those packages
    b. merge luminous-next back into luminous
4. if regression tests do not pass:
    a. merge necessary fixes into luminous
    b. reset luminous-next to luminous and go to step 2

(Like "Large number of reviewers", this measure would require that all 
stakeholders be informed and on-board, with perfect clarity on which 
branch to use for adding the release tag and building the official 
release RPMs/DEBs.)

In my mind, efforts to eliminate the "window of danger" are more likely 
to pay off, since it's a workflow problem that has a technical solution. 
The "when to release" issue, on the other hand, is political with no 
clear solution.

Nathan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html