On Fri, 2011-11-04 at 21:06 +0100, Henrik Nordström wrote: > fre 2011-11-04 klockan 11:53 -0600 skrev Kevin Fenzi: > > > If we do this, next cycle we should NOT do any 'two part' go/no-go > > meetings. > > The "two part" meetings were both about critical blockers that were > known and actively being worked on at the time of the meeting. This > situation will happen no matter what day the Go/No-Go meetings are on. Not necessarily. It's easy enough to go back over the logs; we've certainly had quiet releases where nothing of the kind has happened. I don't think we'd ever delayed the go/no-go prior to this cycle. > The question asked is if having this "soft deadline" style is > acceptable, or if we should stick to documented procedure where the > release should have slipped at both those occations. Well, I don't think anyone so far has said it's acceptable; we all seem to more or less agree we should stop doing that and stick to the dates we set. The question is whether there's any real _need_ for the go/no-go to be nearly a whole day ahead of the release readiness meeting. > While it's nice that we did not slip further (esp for me meeting users > on the 11-13 Nov at FSCONS), it at the same time sets the wrong tone > about seriousness of having critical bugs discovered such late in the > process, and creates an enormous amount of stress and uncertainty for > everyone involved. I'm not sure it sets the wrong tone about the seriousness of the bugs - there was never any question that if we actually couldn't resolve them, we'd have to slip. But it certainly creates stress, yeah. > I do not think it's a healthy sign to have as many tc+rc spins as we had > this time. In how many of these respins were issues not reproducible > during a netinstall? I'm not sure the fact of having spins is a problem per se. Spins aren't particularly difficult to do. You can see the breakdown of all the RC spins: https://fedorahosted.org/rel-eng/ticket/4967 RC2 was a quicky to fix a bug that was proposed soon after the initial RC1. The bug is one that doesn't get covered by our formal validation tests - we don't have any tests for remote home directories, but it's something we do want to take as a blocker if an issue comes up. It's not an issue you'd get on a straightforward local install. The major issue for RC3 was upgrading a system which uses KDE, which was broken due to a dependency issue. Our validation tests don't cover this directly - upgrades are a massive area so what we test is just default installations. But KDE is a supported desktop so when there's an issue that would break the desktop for any KDE user who upgrades, that obviously needs fixing. Again, this is not something you'd see on a straight default install: you have to do a KDE install of F15 and then try to upgrade it. We also took an update to xfwm which fixed an Xfce nice-to-have... ...which actually broke Xfce thanks to one of my least favourite situations, 'fix-this-bug-and-find-the-one-hiding-behind-it'. The RC3 fix meant that xfwm provided a dependency instead of metacity - which we want, obviously, for Xfce - but in fact, gdm's fallback mode doesn't work with anything but metacity. So when you fix it so the Xfce spin doesn't have a bogus requirement for metacity, Xfce breaks, because it turns out there's a *real* requirement for metacity which was unexpressed. So in RC4, we took a gdm update to require metacity. But that wasn't a blocker - only KDE and GNOME block releases. The major blocker issue we fixed in RC4 was a regression in the kernel which broke EFI boot on some systems. In theory you could hit this from a netinst, sure - but you'd have to be installing via EFI on a system affected by this bug. Previous RCs were tested on EFI by multiple people, but none had a system affected by the bug. Finally, RC5 fixed a bug when you do an F15 to F16 upgrade using the traditional installer (DVD/netinst) written to a USB stick. Again, this isn't something you'd hit just doing a straight through install of F16. Note that the bug is masked if you write the USB stick with dd instead of livecd-iso-to-disk, and there is a further wrinkle involved in using l-i-t-d concerning whether you use the --format parameter. So of all the issues that necessitated RC respins, no, you would not have hit a single one just doing a straight-through netinst or DVD install on a typical hardware configuration. That's something that's required to work at *Alpha* stage. By final, we're covering much more complex issues. This is kind of what I mean when I say we're actually holding ourselves to reasonably high quality standards these days. I do suspect that, if you've never actually gotten involved in the release validation cycle, it's easy to underestimate the sheer amount of possible configurations and workflows we have to consider. Just look at the bugs fixed during RC stage, listed above - we've got remote home directories, upgrades to systems using KDE, installs using EFI, and upgrades using the installer written to a USB stick. Just look at those variables - which desktop do you use? Do you boot via BIOS or EFI? Do you use the traditional installer or a live image? Do you write it to a USB stick or an actual disc? Do you use l-i-t-d or dd? Do you format the USB stick or not? That's six 1/0 choices, 64 configurations - *just considering the things that happened to be broken during F16 final RC stage*. The total possible configuration space in all the things we profess to consider release blocking is ridiculously huge, it's probably somewhere in the millions. There's no way we can systematically cover every issue. We already are required to run this complete set of tests: https://fedoraproject.org/wiki/Test_Results:Current_Installation_Test https://fedoraproject.org/wiki/Test_Results:Current_Base_Test https://fedoraproject.org/wiki/Test_Results:Current_Desktop_Test for most composes (we fudge it a bit when the change between composes is very small), and that doesn't really come close to covering everything that can possibly go wrong. A 'full' validation matrix which gave us near-100% confidence of covering everything on the first shot would go on for pages and pages, and take massively more resources than we have or, realistically, are ever going to have. There are various 'hot topics' exposed during the F16 cycle that we'll likely expand the matrix to cover better in F17 - bootloader location issues, EFI issues, USB installer issues (when we first drew up the installation tests, using USB sticks for installation was very rare, and I think you couldn't actually write non-live images to USB at all), and very large disks (plus 4k sector sizes) are some of the things we're looking at - but we can't expand the coverage past the point where we have the resources to actually carry it out, and we're never going to be able to come really close to covering every possibility. Given this, it's pretty impossible to avoid issues coming up along the way, especially in a release like F16 which touches stuff we haven't touched for a long time. -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora http://www.happyassassin.net -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel