On Sat, Feb 15, 2025 at 11:11:49AM -0500, Dusty Mabe wrote: > On 2/15/25 9:54 AM, Zbigniew Jędrzejewski-Szmek wrote: > > On Fri, Feb 14, 2025 at 02:40:29PM -0800, Adam Williamson wrote: > >> On Fri, 2025-02-14 at 16:31 -0500, Dusty Mabe wrote: > >>> IMO the bar would only need to be that high if the user had no way to ignore the test results. > >>> All gating does here (IIUC) is require them to do an extra step before it automatically flows > >>> into the next rawhide compose. > >> > >> again, technically, yes, but *please* let's not train people to have a > >> pavlovian reaction to waive failures, that is not the way. > > > > IMO, the bar for *gating* tests needs to be high. I think 95% true > > positives would be a reasonable threshold. > > I can't promise 95% true positive rate. These aren't unit tests. They are system wide > tests that try to test real world scenarios as much as possible. That does mean pulling > things from github/quay/s3/Fedora infra/etc.. and thus flakes happen. Now, in our > tests we do collect failures and Retry them. If a retry succeeds we take it as success > and never report the failure at all. However there are parts of our pipeline that might > not be so good at retrying. > > All I'm trying to say is that when you don't control everything it's hard to say with > confidence something will be 95%. As AdamW wrote in the other part of the thread, OpenQA maintains a false positive rate close to 0%. So it seems possible, even with our somewhat unreliable infrastructure… I am worried about the high failure rate for the coreos tests. But it is possible that if we make them gating, the reliability will improve. I know that in case of systemd, there was a failure that affected quite a few of the updates because it wasn't fixed immediately. If we blocked the first update, the percentage of failures would be lower. So I think it makes sense to try this… If after a few months with this we still have too many updates blocked by gating, we can reevaluate. > As I promised before, maybe just work with us on it. These tests have been enabled for > a while and I've only seen a handful of package maintainers look at the failures (you, > Zbyszek, being one of them; thank you!). > > We do want them to be useful tests and I promise when a failure happens because of our > infra or tests themselves being flakey we try to get it fixed. One more question: are packagers able to restart the tests? Zbyszek -- _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue