Re: super-drafty F28 and F29 schedules

Adam Williamson <adamwill@xxxxxxxxxxxxxxxxx> · Wed, 12 Jul 2017 14:10:13 -0700

On Thu, 2017-07-06 at 21:15 -0400, Matthew Miller wrote:
> First, there is gating from rel-eng and QA in progress here: 
> https://fedoraproject.org/wiki/Changes/NoMoreAlpha (Note that this is
> compose/validation gating, not the CI stuff we're also talking about
> separately.) That's key in keeping the release basically stable. But
> there's another part:

So, I guess I should set some more detailed expectations here. At least
from my perspective on it.

Compose gating is something that, in principle, we *can* do already.
It's not very difficult. We have quite extensive automated testing of
each compose, between openQA and autocloud. We have the results
reported to resultsdb. It is not fundamentally difficult to write a
thing which queries resultsdb for the test results for a given compose
and makes a decision about whether that compose should be released,
based on those results.

However, there are some counterpoints. One, as always, there's devilry
in the details. We have to decide *what* the criteria are, exactly. For
autocloud, it's sort of 'easy', because autocloud results are not very
granular at all - it's basically a straight up pass/fail for each image
in a compose.

But for openQA...well, I can draw up a list of the openQA tests that
correspond to Alpha criteria, easily enough. But do we want to go
straight there, or start smaller?

Also, what do we do about...*complex* failures? For instance, I already
know, right now, that every so often, anaconda just crashes in the
middle of an install. It's been doing that for a year or so. It's a
mysterious python crash, and it happens very rarely. There are also
similar known 'it occasionally just crashes' bugs in GNOME and KDE -
sometimes KDE startup just fails and the system sits at a black screen
forever, occasionally GNOME crashes back to gdm shortly after login.
And of course sometimes openQA just screws up (it's not perfect), and
sometimes some network or mirror issue produces a failure, etc etc.

So do we introduce some sort of 'fuzz factor' and say 'it's OK if 90%
of the tests pass'? Do we set things up so the 'is it OK to ship?'
calculation automatically re-runs every time a test is restarted, so we
 can just manually restart these kind of tests and if they pass, the
compose will ship? Do I start writing some kind of complex openQA
plugin to try and identify 'known intermittent failures' and
automatically restart such tests? (this is the sort of thing that's
possible, but could also eat my life.)

There's another big point: I suspect that doing this kind of compose
gating is going to *feel* like quite a big change to the distro
development process. releng has been kinda gradually introducing more
and more hurdles to the compose and sync process, but it's still *more
or less* the case that people expect a Rawhide compose to succeed and
sync every day - when composes fail for more than two or three days,
people start getting antsy. If we add compose gating, I really don't
think there's any possible outcome except that 'successful' composes
get noticeably rarer. I can't put numbers on that yet; we could write a
script which applies whatever criteria we decide on to the last, say,
year of Rawhide composes and gives us an idea what percentage would've
met the criteria, but of course that's not really a true representation
because the act of introducing criteria provides a powerful incentive
for people to fix problems which wasn't there before. All we can really
say is that it's pretty likely that 'successful' Rawhide composes will
become to some noticeable degree rarer. And I'm pretty sure that will
have other consequences on how the whole process of working on Fedora
feels. But it's difficult to be too specific until we actually do this.

Honestly - I was kinda banking on us having a reasonable amount of the
usual 'slack time' at the start of a release cycle to try and do a bit
of a 'soft launch' of compose gating, with at least a few weeks for us
to shake down all the details and get a feel for how significant the
impact on the development process is. What you're talking about feels
somewhat different. I'm not necessarily comfortable with us banking on
the idea that compose gating is going to be something we can kick in
*immediately* with complete success, to the degree of basing F28/F29
plans on it.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx