Re: [Freedreno] Adding CI results to the kernel tree was Re: [RFC v2] drm/msm: Add initial ci/ subdirectory

Daniel Vetter <daniel@xxxxxxxx> · Wed, 11 May 2022 18:43:29 +0200

On Wed, 11 May 2022 at 15:33, Rob Clark <robdclark@xxxxxxxxx> wrote:
> On Wed, May 11, 2022 at 4:50 AM Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Wed, May 11, 2022 at 12:26:05PM +0200, Michel Dänzer wrote:
> > > On 2022-05-11 08:22, Greg Kroah-Hartman wrote:
> > > > On Wed, May 11, 2022 at 03:06:47PM +1000, Dave Airlie wrote:
> > > >>> And use it to store expectations about what the drm/msm driver is
> > > >>> supposed to pass in the IGT test suite.
> > > >>
> > > >> I wanted to loop in Linus/Greg to see if there are any issues raised
> > > >> by adding CI results file to the tree in their minds, or if any other
> > > >> subsystem has done this already, and it's all fine.
> > > >
> > > > Why does the results need to be added to the tree?  Shouldn't they be
> > > > either "all is good" or "constantly changing and a constant churn"?
> > > >
> > > >> I think this is a good thing after our Mesa experience, but Mesa has a
> > > >> lot tighter integration here, so I want to get some more opinions
> > > >> outside the group.
> > > >
> > > > For systems that have "tight integration" this might make sense as proof
> > > > that all is working for a specific commit, but I can't see how this will
> > > > help the kernel out much.
> > > >
> > > > What are you going to do with these results being checked in all the
> > > > time?
> > >
> > > Having the expected results in the tree keeps them consistent with the driver code itself, and allows putting in place gating CI to prevent merging driver changes which make any of the tests deviate from the expected result.
> >
> > Shouldn't "expected result" always be "pass"?
> >
> > If not, then the test should be changed to be "skipped" like we have
> > today in the kselftest tests.
>
> No, we want to run tests even if they are expected to fail.  This
> prevents the scenario of a test getting fixed without being noticed
> (for ex, developer was working on fixing test A and didn't notice that
> the fix also fixed test B).  If a fix goes unnoticed, a later
> regression would also go unnoticed ;-)
>
> I was skeptical about this approach at first with mesa CI, but having
> used mesa CI for a while, I am now a firm believer in the approach.
>
> And ofc we want the expectations to be in the kernel tree because
> there could be, for example, differences between -fixes and -next
> branches.  (Or even stable kernel branches if/when we get to the point
> of running CI on those.)

Yeah result files in tree is kinda needed, even more so for the
kernel. A lot of the linux-next integration testing is only done after
patches have landed, and sometimes such breakage makes it to upstream
and then into the subsystem/driver tree. Annotating in the backmerge
what exactly broke and why helps a lot with tracking issues.

And expecting every subsystem to run every other subsystem's tests,
especially tests that run on hw, is just not going to scale. So there
will be all kinds of difference in test results.

> > And how about tieing this into the kselftest process as well, why would
> > this be somehow separate from the rest of the kernel tests?
> >
> > > Keeping them separate inevitably results in divergence between the driver code and the expected test results, which would result in spurious failures of such CI.
> >
> > Again, "pass" should be the expected results :)
> >
> > > I expect the main complication for the kernel will be due to driver changes merged via different trees, e.g. for cross-subsystem reworks. Since those will not go through the same CI, they may accidentally introduce inconsistencies. The ideal solution for this IMO would be centralizing CI such that the same gating tests have to pass regardless of how the code is merged. But there's likely quite a long way to go until we get there. :)
> >
> > We have in-kernel tests for the rest of the kernel, why can't you put
> > your testing stuff into there as well?
>
> We could ofc put a lot more of the gitlab yml and scripts into the
> kernel tree.  Probably all of i-g-t is a bit much to put in the kernel
> tree.  Not to mention I'd like to see this expand to also run some
> deqp and/or piglit tests, which is definitely too much to vendor into
> the kernel tree.
>
> The approach of this RFC was to put only what was absolutely required
> in the kernel tree (such as expectations), and then link out to an
> external drm-ci tree[1] which has all the necessary scripts and yml
> for building and running tests, to avoid having to put a whole lot
> more in the kernel tree. (We should be specifying exact commit-sha for
> that tree, IMO, as it controls the version of i-g-t which gets used,
> and we need to be able to update expectations in sync with an i-g-t
> uprev, for example when new tests are added or if a test fix caused a
> fail->pass transition.)

Yeah I think longer-term we should carry a lot more in upstream, at
least anything that's shared across drivers wrt the ci integration (or
build testing and running tests which are hw agnostic). Maybe even
igt, not sure (otoh xfs-tests isn't moving into the kernel either, and
there's lots more like that).

Personally I think long-term the only thing outside should be other
repos with tests or stuff you need to run them, and not really the
glue to make it all work in ci. But that's maybe a bit too much
wishful thinking if CI systems stay largely subsystem specific (which
they currently are in many ways, with some overlap).

But maybe there is enough random pieces to share here for a lot more
in-tree to make sense, and imo the fewer extra steps and indirection
CI testing and test updating has, the better.

But like Rob says, eventually there's a limit and when you put the
entire GL/vulkan stack + it's conformance testsuite (which is
maintained by khronos somewhere completely different than both
kernel.org and freedesktop.org) then it's definitely too much and wont
work. And eventually we do want to run these things too (e.g.
intel-gfx-ci does run mesa + piglit on every run).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch