On 9/11/23 14:51, Maxime Ripard wrote: > On Mon, Sep 11, 2023 at 02:13:43PM +0200, Michel Dänzer wrote: >> On 9/11/23 11:34, Maxime Ripard wrote: >>> On Thu, Sep 07, 2023 at 01:40:02PM +0200, Daniel Stone wrote: >>>> >>>> Secondly, we will never be there. If we could pause for five years and sit >>>> down making all the current usecases for all the current hardware on the >>>> current kernel run perfectly, we'd probably get there. But we can't: there's >>>> new hardware, new userspace, and hundreds of new kernel trees. >>> >>> [...] >>> >>> I'm not sure it's actually an argument, really. 10 years ago, we would >>> never have been at "every GPU on the market has an open-source driver" >>> here. 5 years ago, we would never have been at this-series-here. That >>> didn't stop anyone making progress, everyone involved in that thread >>> included. >> >> Even assuming perfection is achievable at all (which is very doubtful, >> given the experience from the last few years of CI in Mesa and other >> projects), if you demand perfection before even taking the first step, >> it will never get off the ground. > > Perfection and scale from the get-go isn't reasonable, yes. Building a > small, "perfect" (your words, not mine) system that you can later expand > is doable. I mean "perfect" as in every single available test runs, is reliable and gates CI. Which seems to be what you're asking for. The only possible expansion of such a system would be adding new 100% reliable tests. What is being proposed here is an "imperfect" system which takes into account the reality that some tests are not 100% reliable, and can be improved gradually while already preventing some regressions from getting merged. >>> How are we even supposed to detect those failures in the first >>> place if tests are flagged as unreliable? >> >> Based on experience with Mesa, only a relatively small minority of >> tests should need to be marked as flaky / not run at all. The majority >> of tests are reliable and can catch regressions even while some tests >> are not yet. > > I understand and acknowledge that it worked with Mesa. That's great for > Mesa. That still doesn't mean that it's the panacea and is for every > project. Not sure what you're referring to by panacea, or how it relates to "some tests can be useful even while others aren't yet". >>> No matter what we do here, what you describe will always happen. Like, >>> if we do flag those tests as unreliable, what exactly prevents another >>> issue to come on top undetected, and what will happen when we re-enable >>> testing? >> >> Any issues affecting a test will need to be fixed before (re-)enabling >> the test for CI. > > If that underlying issue is never fixed, at which point do we consider > that it's a failure and should never be re-enabled? Who has that role? Not sure what you're asking. Anybody can (re-)enable a test in CI, they just need to make sure first that it is reliable. Until somebody does that work, it'll stay disabled in CI. >>> It might or might not be an issue for Linus' release, but I can >>> definitely see the trouble already for stable releases where fixes will >>> be backported, but the test state list certainly won't be updated. >> >> If the stable branch maintainers want to take advantage of CI for the >> stable branches, they may need to hunt for corresponding state list >> commits sometimes. They'll need to take that into account for their >> decision. > > So we just expect the stable maintainers to track each and every patches > involved in a test run, make sure that they are in a stable tree, and > then update the test list? Without having consulted them at all? I don't expect them to do anything. See the If at the start of what I wrote. >>>> By keeping those sets of expectations, we've been able to keep Mesa pretty >>>> clear of regressions, whilst having a very clear set of things that should >>>> be fixed to point to. It would be great if those set of things were zero, >>>> but it just isn't. Having that is far better than the two alternatives: >>>> either not testing at all (obviously bad), or having the test always be red >>>> so it's always ignored (might as well just not test). >>> >>> Isn't that what happens with flaky tests anyway? >> >> For a small minority of tests. Daniel was referring to whole test suites. >> >>> Even more so since we have 0 context when updating that list. >> >> The commit log can provide whatever context is needed. > > Sure, I've yet to see that though. > > There's in 6.6-rc1 around 240 reported flaky tests. None of them have > any context. That new series hads a few dozens too, without any context > either. And there's no mention about that being a plan, or a patch > adding a new policy for all tests going forward. That does sound bad, would need to be raised in review. > Any concern I raised were met with a giant "it worked on Mesa" handwave Lessons learned from years of experience with big real-world CI systems like this are hardly "handwaving". -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer