Re: [PATCH v11] drm: Add initial ci/ subdirectory

Michel Dänzer <michel.daenzer@xxxxxxxxxxx> · Mon, 11 Sep 2023 14:13:43 +0200

On 9/11/23 11:34, Maxime Ripard wrote:
> On Thu, Sep 07, 2023 at 01:40:02PM +0200, Daniel Stone wrote:
>> Yeah, this is what our experience with Mesa (in particular) has taught us.
>>
>> Having 100% of the tests pass 100% of the time on 100% of the platforms is a
>> great goal that everyone should aim for. But it will also never happen.
>>
>> Firstly, we're just not there yet today. Every single GPU-side DRM driver
>> has userspace-triggerable faults which cause occasional errors in GL/Vulkan
>> tests. Every single one. We deal with these in Mesa by retrying; if we
>> didn't retry, across the breadth of hardware we test, I'd expect 99% of
>> should-succeed merges to fail because of these intermittent bugs in the DRM
>> drivers.
> 
> So the plan is only to ever test rendering devices? It should have been
> made clearer then.
> 
>> We don't have the same figure for KMS - because we don't test it - but
>> I'd be willing to bet no driver is 100% if you run tests often enough.
> 
> And I would still consider that a bug that we ought to fix, and
> certainly not something we should sweep under the rug. If half the tests
> are not running on a driver, then fine, they aren't. I'm not really
> against having failing tests, I'm against not flagging unreliable tests
> on a given hardware as failing tests.

A flaky test will by definition give a "pass" result at least some of the time, which would be considered a failure by the CI if the test is marked as failing.

>> Secondly, we will never be there. If we could pause for five years and sit
>> down making all the current usecases for all the current hardware on the
>> current kernel run perfectly, we'd probably get there. But we can't: there's
>> new hardware, new userspace, and hundreds of new kernel trees.
> 
> Not with that attitude :)

Attitude is not the issue, the complexity of the multiple systems involved is.

> I'm not sure it's actually an argument, really. 10 years ago, we would
> never have been at "every GPU on the market has an open-source driver"
> here. 5 years ago, we would never have been at this-series-here. That
> didn't stop anyone making progress, everyone involved in that thread
> included.

Even assuming perfection is achievable at all (which is very doubtful, given the experience from the last few years of CI in Mesa and other projects), if you demand perfection before even taking the first step, it will never get off the ground.

> How are we even supposed to detect those failures in the first
> place if tests are flagged as unreliable?

Based on experience with Mesa, only a relatively small minority of tests should need to be marked as flaky / not run at all. The majority of tests are reliable and can catch regressions even while some tests are not yet.

> No matter what we do here, what you describe will always happen. Like,
> if we do flag those tests as unreliable, what exactly prevents another
> issue to come on top undetected, and what will happen when we re-enable
> testing?

Any issues affecting a test will need to be fixed before (re-)enabling the test for CI.

> On top of that, you kind of hinted at that yourself, but what set of
> tests will pass is a property linked to a single commit. Having that
> list within the kernel already alters that: you'll need to merge a new
> branch, add a bunch of fixes and then change the test list state. You
> won't have the same tree you originally tested (and defined the test
> state list for).

Ideally, the test state lists should be changed in the same commits which affect the test results. It'll probably take a while yet to get there for the kernel.

> It might or might not be an issue for Linus' release, but I can
> definitely see the trouble already for stable releases where fixes will
> be backported, but the test state list certainly won't be updated.

If the stable branch maintainers want to take advantage of CI for the stable branches, they may need to hunt for corresponding state list commits sometimes. They'll need to take that into account for their decision.

>> By keeping those sets of expectations, we've been able to keep Mesa pretty
>> clear of regressions, whilst having a very clear set of things that should
>> be fixed to point to. It would be great if those set of things were zero,
>> but it just isn't. Having that is far better than the two alternatives:
>> either not testing at all (obviously bad), or having the test always be red
>> so it's always ignored (might as well just not test).
> 
> Isn't that what happens with flaky tests anyway?

For a small minority of tests. Daniel was referring to whole test suites.

> Even more so since we have 0 context when updating that list.

The commit log can provide whatever context is needed.

> I've asked a couple of times, I'll ask again. In that other series, on
> the MT8173, kms_hdmi_inject@inject-4k is setup as flaky (which is a KMS
> test btw).
> 
> I'm a maintainer for that part of the kernel, I'd like to look into it,
> because it's seriously something that shouldn't fail, ever, the hardware
> isn't involved.
> 
> How can I figure out now (or worse, let's say in a year) how to
> reproduce it? What kernel version was affected? With what board? After
> how many occurences?
> 
> Basically, how can I see that the bug is indeed there (or got fixed
> since), and how to start fixing it?

Many of those things should be documented in the commit log of the state list change.

How the CI works in general should be documented in some appropriate place in tree.

-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer