On Mon, Sep 11, 2023 at 02:13:43PM +0200, Michel Dänzer wrote: > On 9/11/23 11:34, Maxime Ripard wrote: > > On Thu, Sep 07, 2023 at 01:40:02PM +0200, Daniel Stone wrote: > >> Yeah, this is what our experience with Mesa (in particular) has taught us. > >> > >> Having 100% of the tests pass 100% of the time on 100% of the platforms is a > >> great goal that everyone should aim for. But it will also never happen. > >> > >> Firstly, we're just not there yet today. Every single GPU-side DRM driver > >> has userspace-triggerable faults which cause occasional errors in GL/Vulkan > >> tests. Every single one. We deal with these in Mesa by retrying; if we > >> didn't retry, across the breadth of hardware we test, I'd expect 99% of > >> should-succeed merges to fail because of these intermittent bugs in the DRM > >> drivers. > > > > So the plan is only to ever test rendering devices? It should have been > > made clearer then. > > > >> We don't have the same figure for KMS - because we don't test it - but > >> I'd be willing to bet no driver is 100% if you run tests often enough. > > > > And I would still consider that a bug that we ought to fix, and > > certainly not something we should sweep under the rug. If half the tests > > are not running on a driver, then fine, they aren't. I'm not really > > against having failing tests, I'm against not flagging unreliable tests > > on a given hardware as failing tests. > > A flaky test will by definition give a "pass" result at least some of the time, which would be considered a failure by the CI if the test is marked as failing. > > > >> Secondly, we will never be there. If we could pause for five years and sit > >> down making all the current usecases for all the current hardware on the > >> current kernel run perfectly, we'd probably get there. But we can't: there's > >> new hardware, new userspace, and hundreds of new kernel trees. > > > > Not with that attitude :) > > Attitude is not the issue, the complexity of the multiple systems > involved is. FTR, that was a meme/joke. > > I'm not sure it's actually an argument, really. 10 years ago, we would > > never have been at "every GPU on the market has an open-source driver" > > here. 5 years ago, we would never have been at this-series-here. That > > didn't stop anyone making progress, everyone involved in that thread > > included. > > Even assuming perfection is achievable at all (which is very doubtful, > given the experience from the last few years of CI in Mesa and other > projects), if you demand perfection before even taking the first step, > it will never get off the ground. Perfection and scale from the get-go isn't reasonable, yes. Building a small, "perfect" (your words, not mine) system that you can later expand is doable. And that's very much a design choice. > > How are we even supposed to detect those failures in the first > > place if tests are flagged as unreliable? > > Based on experience with Mesa, only a relatively small minority of > tests should need to be marked as flaky / not run at all. The majority > of tests are reliable and can catch regressions even while some tests > are not yet. I understand and acknowledge that it worked with Mesa. That's great for Mesa. That still doesn't mean that it's the panacea and is for every project. > > No matter what we do here, what you describe will always happen. Like, > > if we do flag those tests as unreliable, what exactly prevents another > > issue to come on top undetected, and what will happen when we re-enable > > testing? > > Any issues affecting a test will need to be fixed before (re-)enabling > the test for CI. If that underlying issue is never fixed, at which point do we consider that it's a failure and should never be re-enabled? Who has that role? > > On top of that, you kind of hinted at that yourself, but what set of > > tests will pass is a property linked to a single commit. Having that > > list within the kernel already alters that: you'll need to merge a new > > branch, add a bunch of fixes and then change the test list state. You > > won't have the same tree you originally tested (and defined the test > > state list for). > > Ideally, the test state lists should be changed in the same commits > which affect the test results. It'll probably take a while yet to get > there for the kernel. > > > It might or might not be an issue for Linus' release, but I can > > definitely see the trouble already for stable releases where fixes will > > be backported, but the test state list certainly won't be updated. > > If the stable branch maintainers want to take advantage of CI for the > stable branches, they may need to hunt for corresponding state list > commits sometimes. They'll need to take that into account for their > decision. So we just expect the stable maintainers to track each and every patches involved in a test run, make sure that they are in a stable tree, and then update the test list? Without having consulted them at all? > >> By keeping those sets of expectations, we've been able to keep Mesa pretty > >> clear of regressions, whilst having a very clear set of things that should > >> be fixed to point to. It would be great if those set of things were zero, > >> but it just isn't. Having that is far better than the two alternatives: > >> either not testing at all (obviously bad), or having the test always be red > >> so it's always ignored (might as well just not test). > > > > Isn't that what happens with flaky tests anyway? > > For a small minority of tests. Daniel was referring to whole test suites. > > > Even more so since we have 0 context when updating that list. > > The commit log can provide whatever context is needed. Sure, I've yet to see that though. There's in 6.6-rc1 around 240 reported flaky tests. None of them have any context. That new series hads a few dozens too, without any context either. And there's no mention about that being a plan, or a patch adding a new policy for all tests going forward. So I'm still fairly doubtful it will ever happen. > > I've asked a couple of times, I'll ask again. In that other series, on > > the MT8173, kms_hdmi_inject@inject-4k is setup as flaky (which is a KMS > > test btw). > > > > I'm a maintainer for that part of the kernel, I'd like to look into it, > > because it's seriously something that shouldn't fail, ever, the hardware > > isn't involved. > > > > How can I figure out now (or worse, let's say in a year) how to > > reproduce it? What kernel version was affected? With what board? After > > how many occurences? > > > > Basically, how can I see that the bug is indeed there (or got fixed > > since), and how to start fixing it? > > Many of those things should be documented in the commit log of the > state list change. > > How the CI works in general should be documented in some appropriate > place in tree. I think I'll stop the discussion there. It was merged anyway so I'm not quite sure why I was asked to give my feedback on this. Any concern I raised were met with a giant "it worked on Mesa" handwave or "someone will probably work on it at some point". And fine, guess I'm wrong. Thanks Maxime
Attachment:
signature.asc
Description: PGP signature