Re: [Test-Announce] Fedora 16 Final Release Declared GOLD!

Adam Williamson <awilliam@xxxxxxxxxx> · Fri, 04 Nov 2011 22:22:23 -0700

On Fri, 2011-11-04 at 21:06 +0100, Henrik Nordström wrote:
> fre 2011-11-04 klockan 11:53 -0600 skrev Kevin Fenzi:
> 
> > If we do this, next cycle we should NOT do any 'two part' go/no-go
> > meetings. 
> 
> The "two part" meetings were both about critical blockers that were
> known and actively being worked on at the time of the meeting. This
> situation will happen no matter what day the Go/No-Go meetings are on.

Not necessarily. It's easy enough to go back over the logs; we've
certainly had quiet releases where nothing of the kind has happened. I
don't think we'd ever delayed the go/no-go prior to this cycle.

> The question asked is if having this "soft deadline" style is
> acceptable, or if we should stick to documented procedure where the
> release should have slipped at both those occations.

Well, I don't think anyone so far has said it's acceptable; we all seem
to more or less agree we should stop doing that and stick to the dates
we set. The question is whether there's any real _need_ for the go/no-go
to be nearly a whole day ahead of the release readiness meeting.

> While it's nice that we did not slip further (esp for me meeting users
> on the 11-13 Nov at FSCONS), it at the same time sets the wrong tone
> about seriousness of having critical bugs discovered such late in the
> process, and creates an enormous amount of stress and uncertainty for
> everyone involved.

I'm not sure it sets the wrong tone about the seriousness of the bugs -
there was never any question that if we actually couldn't resolve them,
we'd have to slip. But it certainly creates stress, yeah.

> I do not think it's a healthy sign to have as many tc+rc spins as we had
> this time. In how many of these respins were issues not reproducible
> during a netinstall?

I'm not sure the fact of having spins is a problem per se. Spins aren't
particularly difficult to do.

You can see the breakdown of all the RC spins:

https://fedorahosted.org/rel-eng/ticket/4967

RC2 was a quicky to fix a bug that was proposed soon after the initial
RC1. The bug is one that doesn't get covered by our formal validation
tests - we don't have any tests for remote home directories, but it's
something we do want to take as a blocker if an issue comes up. It's not
an issue you'd get on a straightforward local install.

The major issue for RC3 was upgrading a system which uses KDE, which was
broken due to a dependency issue. Our validation tests don't cover this
directly - upgrades are a massive area so what we test is just default
installations. But KDE is a supported desktop so when there's an issue
that would break the desktop for any KDE user who upgrades, that
obviously needs fixing. Again, this is not something you'd see on a
straight default install: you have to do a KDE install of F15 and then
try to upgrade it. We also took an update to xfwm which fixed an Xfce
nice-to-have...

...which actually broke Xfce thanks to one of my least favourite
situations, 'fix-this-bug-and-find-the-one-hiding-behind-it'. The RC3
fix meant that xfwm provided a dependency instead of metacity - which we
want, obviously, for Xfce - but in fact, gdm's fallback mode doesn't
work with anything but metacity. So when you fix it so the Xfce spin
doesn't have a bogus requirement for metacity, Xfce breaks, because it
turns out there's a *real* requirement for metacity which was
unexpressed. So in RC4, we took a gdm update to require metacity. But
that wasn't a blocker - only KDE and GNOME block releases. The major
blocker issue we fixed in RC4 was a regression in the kernel which broke
EFI boot on some systems. In theory you could hit this from a netinst,
sure - but you'd have to be installing via EFI on a system affected by
this bug. Previous RCs were tested on EFI by multiple people, but none
had a system affected by the bug.

Finally, RC5 fixed a bug when you do an F15 to F16 upgrade using the
traditional installer (DVD/netinst) written to a USB stick. Again, this
isn't something you'd hit just doing a straight through install of F16.
Note that the bug is masked if you write the USB stick with dd instead
of livecd-iso-to-disk, and there is a further wrinkle involved in using
l-i-t-d concerning whether you use the --format parameter.

So of all the issues that necessitated RC respins, no, you would not
have hit a single one just doing a straight-through netinst or DVD
install on a typical hardware configuration. That's something that's
required to work at *Alpha* stage. By final, we're covering much more
complex issues. This is kind of what I mean when I say we're actually
holding ourselves to reasonably high quality standards these days.

I do suspect that, if you've never actually gotten involved in the
release validation cycle, it's easy to underestimate the sheer amount of
possible configurations and workflows we have to consider. Just look at
the bugs fixed during RC stage, listed above - we've got remote home
directories, upgrades to systems using KDE, installs using EFI, and
upgrades using the installer written to a USB stick. Just look at those
variables - which desktop do you use? Do you boot via BIOS or EFI? Do
you use the traditional installer or a live image? Do you write it to a
USB stick or an actual disc? Do you use l-i-t-d or dd? Do you format the
USB stick or not? That's six 1/0 choices, 64 configurations - *just
considering the things that happened to be broken during F16 final RC
stage*. The total possible configuration space in all the things we
profess to consider release blocking is ridiculously huge, it's probably
somewhere in the millions. There's no way we can systematically cover
every issue.

We already are required to run this complete set of tests:

https://fedoraproject.org/wiki/Test_Results:Current_Installation_Test
https://fedoraproject.org/wiki/Test_Results:Current_Base_Test
https://fedoraproject.org/wiki/Test_Results:Current_Desktop_Test

for most composes (we fudge it a bit when the change between composes is
very small), and that doesn't really come close to covering everything
that can possibly go wrong. A 'full' validation matrix which gave us
near-100% confidence of covering everything on the first shot would go
on for pages and pages, and take massively more resources than we have
or, realistically, are ever going to have.

There are various 'hot topics' exposed during the F16 cycle that we'll
likely expand the matrix to cover better in F17 - bootloader location
issues, EFI issues, USB installer issues (when we first drew up the
installation tests, using USB sticks for installation was very rare, and
I think you couldn't actually write non-live images to USB at all), and
very large disks (plus 4k sector sizes) are some of the things we're
looking at - but we can't expand the coverage past the point where we
have the resources to actually carry it out, and we're never going to be
able to come really close to covering every possibility. Given this,
it's pretty impossible to avoid issues coming up along the way,
especially in a release like F16 which touches stuff we haven't touched
for a long time.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora
http://www.happyassassin.net

-- 
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/devel