On Wed, 2020-10-28 at 12:29 -0400, Ben Cotton wrote: > In yesterday's F33 emergency brake meeting[1], one of the issues that > came up is the short window for testing all of the different IoT > hardware. One way to reduce that effort (by extending the window) is > reducing the change set in the days before an RC. I don't know if it > would have helped in this particular case, as the kernel update that > caused some of the problems had security fixes, so we may have left it > in anyway. But the general idea is if we don't allow FEs all the way > up to the RC compose, we reduce the number of things that could break. > > I don't know if I personally endorse this proposal or not, but in the > interests of open discussion, I am proposing a change to the freeze > exception process: > > > Updates with an accepted Freeze Exceptions will not be included less than X hours before the scheduled start of the Go/No-Go Meeting. > > I left the time unspecified for now, because we should figure out if > we like the general concept before we decide on the specific deadline. > I was thinking something like 72 hours (3 days), which would put the > deadline at 1700 UTC Monday. > > For simplicity's sake, I'm only sending this to the QA list now. If we > have a general consensus on it being a good thing, we should > distribute it more broadly before we adopt it. I don't think I agree with this, but I *do* think I handled this badly in the specific case we're referring to. So, what happened here is: * We had an accepted FE bug for a Bluetooth CVE (security) issue: https://bugzilla.redhat.com/show_bug.cgi?id=1888439 * A kernel update marked as fixing that bug was submitted on 2020-10-15 and submitted for stable later the same day. I submitted a stable push request including that update the next day, and it was pushed. 2020-10- 16 was exactly a week before the Final Go/No-Go meeting, so this update was in stable for a week before Go/No-Go: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ce117eff51 https://pagure.io/releng/issue/9725#comment-696530 * The kernel update first appeared in a compose validation event on 2020-10-19, only four days before Go/No-Go, because there was no nightly validation event between 2020-10-16 and RC 1.2, which was built on 2020-10-19. * The update did not just fix the Bluetooth CVE: it was actually an *entire kernel patch version update*, from 5.8.14 to 5.8.15. There is actually a policy that updates to fix blocker and FE issues should include the minimal change necessary to fix the bug. We (mainly I) have gotten progressively laxer about enforcing that in the last few years, to the point where I rarely actually bother with it at all except in super egregious cases. This is because we've kinda got a better track record of updates not breaking stuff, and we have much better test coverage than we used to. However, that was obviously the problem in this case. We (mainly I) pulled in what was actually a fairly big change (new kernel release) quite late in the process when we could have insisted on a smaller change (just the CVE fix patch on top of 5.8.14), didn't include it in a candidate compose until even later in the process, and didn't flag it up as a significant change that needed testing. That's on me and I apologize for it. (For the record, I didn't consciously consider this and decide it wasn't a big deal; I actually just kinda blew through the stable push request quickly, I don't recall why, and didn't *notice* we were pulling in an entire kernel release bump. Obviously, part of that is the above note that I've been getting generally laxer about checking this; a few years back I used to rigorously check how much change was in every single proposed blocker/FE update, nowadays I kinda...don't.) So I don't think we really need a new rule here, we (mainly I) just need to go back to being more careful about the policy we already have. What should have happened here is I should have noticed we had an entire kernel version update proposed as the fix for an FE and talked to the kernel team about it. We could then either have decided to pull in the update, but make sure it was properly tested, including flagging it up to the ARM folks and making sure we had time to test it across the important ARM platforms; or we could have decided to not pull in 5.8.15 and instead do 5.8.14 with a patch for the CVE. All of that is what *should have happened under the current rules*, and I just whiffed it. Again I'm sorry for that. -- Adam Williamson Fedora QA Community Monkey IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net http://www.happyassassin.net _______________________________________________ test mailing list -- test@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to test-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/test@xxxxxxxxxxxxxxxxxxxxxxx