On Thu, 2012-11-22 at 11:03 -0700, Tim Flink wrote: > The problem with our release validation process is where to draw the > line between what we test and what we don't. Our test cases are pretty > much human bash scripts and that seems to be what people expect. If > you follow all the steps of a test case, you can mark it pass/fail and > be done with it; no improvisation or filling in unwritten stuff should > be required. I'm personally not a fan of this testing style but it is > what it is and I choose my battles - test case writing style is not one > that I'm choosing to fight right now. There's a clear benefit to the human bash script style which I think is important in the context of release validation. It ensures (at least, tries to - despite our relatively strict test cases, we _still_ have problems with this) that despite the fact we have different people in different places in different time zones running the tests on different machines, they're all broadly testing the same thing. If our test cases were less deterministic, it'd be very difficult to compare results. If the upgrade test case just said effectively 'run an upgrade however you like', and we saw 'Fail' from Bob for TC2, 'Pass' from Anna for TC4, 'Pass' from James for TC6, 'Fail' from Bob for TC8, and 'Pass' from Anna for RC1, what conclusions could we draw about how the install test case behaviour had changed over those releases? Very few or none, without careful analysis of any filed bugs, because we'd have no idea if all those people had been testing the same thing, or testing really different configurations. Even though we try to be human-bash-script-y at present, the fact that it's really difficult to nail down all environmental factors already often gives us trouble replicating each other's results - we have lots of 'how did you set up XYZ? was your repo on this server or that server?' stuff flying around on IRC. If our tests were less deterministic I think we'd find it even harder. We might gain a bit more coverage, but at the cost of making it more or less impossible to confidently compare test results from different testers across releases. > With the testing resources we have right now, I think that limiting the > upgrade test case to the default package set is our best option. Does > it cover everything? No. Does it cover the most common issues? Probably. > It does make some assurances that the fedup process is at least working > for a known base case. Right, this is the main point here - as both you and Johann said, the limitation of the scope of the upgrade test is entirely intentional, and an acknowledgement of the size of the problem space and the resources we have to apply to it. -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | identi.ca: adamwfedora http://www.happyassassin.net -- test mailing list test@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe: https://admin.fedoraproject.org/mailman/listinfo/test