On Tue, 2012-10-09 at 16:48 -0400, David Cantrell wrote: > On Tue, Oct 09, 2012 at 07:19:25AM -0600, Tim Flink wrote: > > As we're getting closer to the scheduled time for beta freeze, we'd > > like to find out now if any of the current criteria or proposed criteria > > changes are unreasonable to expect for beta. There may be more changes > > for final as we get closer to that but I think that we're pretty close > > to being done with the release requirements for beta. > > > > The current (as of this writing) release criteria are available at: > > - http://fedoraproject.org/wiki/Fedora_18_Alpha_Release_Criteria > > - http://fedoraproject.org/wiki/Fedora_18_Beta_Release_Criteria > > - http://fedoraproject.org/wiki/Fedora_18_Final_Release_Criteria Thanks David! Some thoughts follow. > I would like to see changes to the blocker criteria for each release. The > first item on each release criteria is that all blockers must be CLOSED. > Blockers are determined by criteria defined below which always group > anaconda in because we cannot address those problems in a later update > release. This gets us on the bug fixing treadmill as we edge closer to each > release because every anaconda bug more or less becomes a blocker. This paragraph was a bit tricky to read, but now I've given it a few tries, it seems to be more or less a preamble, yes? I'm not sure if you're suggesting that "Blockers are determined by criteria defined below which always group anaconda in because we cannot address those problems in a later update release. This gets us on the bug fixing treadmill as we edge closer to each release because every anaconda bug more or less becomes a blocker." is a problem, or just mentioning it as background. It's perfectly true as background, but I don't see it as a problem: it's just an innate characteristic of the software you write. The installer is something that cannot be updated (for practical purposes), it must work to a high standard as shipped, because if it doesn't, that's a much bigger problem than a component which _can_ be updated not working. I agree with your assessment, but I see it as an inherent characteristic of an operating system installer, not any kind of problem in the process. > What I > would like to see: > > 1) Installer blocker criteria needs to be more and more restrictive the > closer we get to a final release. This wasn't entirely clear to me, but I'm going to take a guess at what I think you mean and reply to that. I think you're looking at the situation where we get late into the validation process - say, we just built RC2 and it's two days to go/no-go - and we find five bugs and mark them as blockers. I'm guessing you're saying it'd be preferable to identify blockers early and we should only add issues to the blocker list late if they're _really bad_, because otherwise you just keep fixing blockers. On that basis... I appreciate that the 'blocker treadmill', as you describe it, can be frustrating. But I don't think 'let's just not count bad issues as blockers late in the process to give the developers a break' is the answer to anything (except possibly 'how can we stop Will depleting the U.S. strategic gin reserve?', but that's not the question this post was trying to answer :>). What we're trying to do with the release validation process as a whole is provide a clear framework for defining the standards our releases should meet and a clear process for building releases that meet those standards and verifying that they meet those standards. I don't see that adding an element of time sensitivity to the blocker evaluation process - 'issues of the type X are blockers if we find them four weeks before release, but not if we find them one week before release' - is a good way to achieve this. 'Blocker bugs' are just the 'release quality' question inverted: they are the ways in which our releases must not be broken in order to meet the minimum quality standards we've decided on. An issue which causes us to fall below our minimum quality standards is a problem no matter when it's discovered. I absolutely understand that it makes things easier for the developers if we catch blocker bugs early, and we agree this is an important goal and we have made and will continue to make efforts to improve our ability to catch blockers as early as possible. I know it sucks when we're on RC3 and we suddenly discover a major bug. But it's still a major bug, and 'say it's not a blocker because we're late in the process' doesn't sound like a good response to that suckage, to me. I don't want to do that. I believe we should set realistic minimum standards - those that are achievable with the level of development resources we have in place, on the release schedules Fedora is committed to. What this thread is about, essentially, is checking that we are not currently setting that bar too high, and demanding from you more than you have the resources to possibly provide in the time available. We certainly believe that we need input from the development teams to know where the bar should be set. But I do believe the bar should be a bar, not a fuzzy field that can be adjusted with excessive pragmatism. We should set realistic standards, but they have to be solid ones that we don't compromise just because time is short or the developers are getting tired of fixing bugs. What we (QA) as a team do try and do in those cases is look at the situation and think what we could do in future to ensure the blocker would get caught earlier. For instance, in the last few releases we've been making a more concerted effort to complete testing even on TC/RC builds that have obvious showstoppers - to catch the other bugs 'behind' the showstoppers, rather than just catching the showstoppers and then focusing work on getting them fixed, then continuing on with testing of other functionality. I don't mean to start a finger-pointing match, but I do think it's worth bearing in mind that the 'blocker treadmill' is much more likely to happen when there are major changes to anaconda, because these massively increase the surface area of code that's prone to causing blocker bugs. When we do a release, we can say with a reasonable degree of certainty that the code in that release probably contains very few blocker bugs - only ones we didn't catch in the validation process it just went through. If we then do another release in which that code isn't changed very much, well, we aren't likely to have two hundred new blocker bugs. But if we (Fedora) do, oh, let's just say as _entirely theoretical examples_, rewrite the entire storage backend, or replace the entire first stage of the whole installer, or rewrite the entire user interface...we've just thrown out all the code that's relatively well known to be 'blocker free', and replaced it with an entirely new chunk of code about which we know just about nothing from a quality perspective. Statistically speaking, no matter how awesome the person or people writing it, that new chunk of code is very likely to contain more blockers than the code it replaced. Major changes to the code inevitably result in more blockers being present, and thus more blocker treadmill, than light-touch maintenance of a mature codebase does. We (QA) are always going to be able to find ten blockers in a well-known codebase much faster than we can find two hundred blockers in a heavily revised codebase. Certainly QA has some responsibility for the 'blocker treadmill', as I noted above, it's our responsibility to try and identify blocker bugs as early and as quickly as possible, and this is something we can and should always look to improve. But developers also have responsibility for it. If you're stuck on a 'blocker treadmill' it could be an indication that QA could and should have discovered the blocker bugs faster, but it could also be an indication that you have been too ambitious in your planning in terms of what amount of new or revised code of acceptable quality you expected to be able to implement in what time frame, and consequently you have delivered code that is heavily bugged, at a late enough point in the development cycle that you immediately wind up on a 'blocker treadmill' just fixing all the bugs in the code you just delivered. I don't think it's controversial to say this has been known to be a problem in the world of software development before :) > 2) Installer blockers should only be granted when there is no other way to > accomplish the same task during installation. For example, if FCoE > configuration does not always work in the UI but does work when passed boot > parameters or via kickstart, we shouldn't consider it a blocker. It's an > unfortunate bug, but as described there is an install path for those users. In practice we do and always have considered workarounds in evaluating blocker status for bugs. This isn't brilliantly called out in the criteria pages, I admit, and we should improve that. The section '(release) Blocker Bugs', right below the criteria, could really do with some adjustment. It's hard to be more precise than this because workaround evaluation is one part of the blocker review process that more or less inevitably continues to involve subjectivity, and it's very much a bug-by-bug thing. But obviously, the more severe and more commonly-encountered the issue, the less likely we usually are to accept 'there's a workaround' as a reason not to take it as a blocker. The ease of the workaround and the likelihood of a user thinking of it themselves - or at least figuring that there _might_ be a workaround, and they should go and look for one - are also taken into consideration. So...we do consider workarounds. And yes, this should be explained more clearly in the process documentation, we'll address that. I don't think we should accept your principle - "Installer blockers should only be granted when there is no other way to accomplish the same task during installation." - as solidly as it's stated, though, as it removes too much flexibility in the evaluation process. To give a competing example, in anaconda 18.13 there is a bug in the new partitioning process - I call it 'guided partitioning', the dialog which attempts to help you free up space on a full disk, by deleting or shrinking partitions - which causes it to crash when trying to delete partitions. But if you go into the 'custom partitioning' interface you can successfully delete partitions. So by your principle, we would not take that bug as a release blocker. I don't think that would be a good decision: we should not release an installer which crashes when you try to follow the path you're guided to, for freeing up space to install the operating system. 'Don't do what the installer recommends you to do, instead go into this advanced process that's supposed to be for experts' is a workaround, and hence satisfies your requirement for not-a-blocker, but I really don't think it's a good story to tell people in the case of such a critical bit of functionality. It also has a clear negative effect on the very problem we're discussing here: the broken code cannot get any testing. All we can know about a codepath that's broken, but for which we accepted a workaround that dodges the broken codepath, is 'it's completely broken'. If the broken codepath is not treated as a blocker and fixed rapidly, we cannot test it 'beyond' the blocker bug. There might be five further blockers behind that bug, or just _regular_ bugs ('the UI sucks', 'it doesn't offer to let me resize a partition it should have done', 'it prints a bogus error message when I delete a partition'...all those kinds of perfectly normal bugs), but if we take a workaround and called it 'not a blocker' it gets dropped in priority, likely doesn't get fixed for weeks, and when it turns out there's five other bugs 'behind' the showstopper...well, they go on your treadmill. =) Accepting workarounds too readily actually _impedes_ our ability to find other bugs swiftly. > 3) Ultimately we want the number of granted blockers to be lower and lower > from alpha to beta to rc. I understand the motivation behind this, and I think it's a goal we can attempt to address by ensuring comprehensive testing is done early (and a goal you can help to address by ensuring major code changes land early enough to be tested, and budgeting time and resources for fixing the bugs that will _inevitably be present_ in any large chunk of new code). But I don't think 'make it harder for a bug to qualify as a blocker the later we get in the release process' is a good thing to do, even though it would help to achieve this goal. To me it looks like a process hack which would ultimately damage the quality of our releases. It's actually something that we've specifically tried *not* to do in the blocker review process, since it was implemented. We have very intentionally attempted to review bugs 'impartially', treating blocker status as something a bug either should have or should not have on its own merits, and attempting not to take into account things that strictly should not be taken into account, like 'is there a fix already?' or 'how close are we to release?' Thanks again for your thoughts! -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora http://www.happyassassin.net _______________________________________________ Anaconda-devel-list mailing list Anaconda-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/anaconda-devel-list