Re: [Freedreno] Adding CI results to the kernel tree was Re: [RFC v2] drm/msm: Add initial ci/ subdirectory

Rob Clark <robdclark@xxxxxxxxx> · Wed, 11 May 2022 10:23:21 -0700

On Wed, May 11, 2022 at 9:43 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> On Wed, 11 May 2022 at 15:33, Rob Clark <robdclark@xxxxxxxxx> wrote:
> > On Wed, May 11, 2022 at 4:50 AM Greg Kroah-Hartman
> > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Wed, May 11, 2022 at 12:26:05PM +0200, Michel Dänzer wrote:
> > > > On 2022-05-11 08:22, Greg Kroah-Hartman wrote:
> > > > > On Wed, May 11, 2022 at 03:06:47PM +1000, Dave Airlie wrote:
> > > > >>> And use it to store expectations about what the drm/msm driver is
> > > > >>> supposed to pass in the IGT test suite.
> > > > >>
> > > > >> I wanted to loop in Linus/Greg to see if there are any issues raised
> > > > >> by adding CI results file to the tree in their minds, or if any other
> > > > >> subsystem has done this already, and it's all fine.
> > > > >
> > > > > Why does the results need to be added to the tree?  Shouldn't they be
> > > > > either "all is good" or "constantly changing and a constant churn"?
> > > > >
> > > > >> I think this is a good thing after our Mesa experience, but Mesa has a
> > > > >> lot tighter integration here, so I want to get some more opinions
> > > > >> outside the group.
> > > > >
> > > > > For systems that have "tight integration" this might make sense as proof
> > > > > that all is working for a specific commit, but I can't see how this will
> > > > > help the kernel out much.
> > > > >
> > > > > What are you going to do with these results being checked in all the
> > > > > time?
> > > >
> > > > Having the expected results in the tree keeps them consistent with the driver code itself, and allows putting in place gating CI to prevent merging driver changes which make any of the tests deviate from the expected result.
> > >
> > > Shouldn't "expected result" always be "pass"?
> > >
> > > If not, then the test should be changed to be "skipped" like we have
> > > today in the kselftest tests.
> >
> > No, we want to run tests even if they are expected to fail.  This
> > prevents the scenario of a test getting fixed without being noticed
> > (for ex, developer was working on fixing test A and didn't notice that
> > the fix also fixed test B).  If a fix goes unnoticed, a later
> > regression would also go unnoticed ;-)
> >
> > I was skeptical about this approach at first with mesa CI, but having
> > used mesa CI for a while, I am now a firm believer in the approach.
> >
> > And ofc we want the expectations to be in the kernel tree because
> > there could be, for example, differences between -fixes and -next
> > branches.  (Or even stable kernel branches if/when we get to the point
> > of running CI on those.)
>
> Yeah result files in tree is kinda needed, even more so for the
> kernel. A lot of the linux-next integration testing is only done after
> patches have landed, and sometimes such breakage makes it to upstream
> and then into the subsystem/driver tree. Annotating in the backmerge
> what exactly broke and why helps a lot with tracking issues.
>
> And expecting every subsystem to run every other subsystem's tests,
> especially tests that run on hw, is just not going to scale. So there
> will be all kinds of difference in test results.
>
> > > And how about tieing this into the kselftest process as well, why would
> > > this be somehow separate from the rest of the kernel tests?
> > >
> > > > Keeping them separate inevitably results in divergence between the driver code and the expected test results, which would result in spurious failures of such CI.
> > >
> > > Again, "pass" should be the expected results :)
> > >
> > > > I expect the main complication for the kernel will be due to driver changes merged via different trees, e.g. for cross-subsystem reworks. Since those will not go through the same CI, they may accidentally introduce inconsistencies. The ideal solution for this IMO would be centralizing CI such that the same gating tests have to pass regardless of how the code is merged. But there's likely quite a long way to go until we get there. :)
> > >
> > > We have in-kernel tests for the rest of the kernel, why can't you put
> > > your testing stuff into there as well?
> >
> > We could ofc put a lot more of the gitlab yml and scripts into the
> > kernel tree.  Probably all of i-g-t is a bit much to put in the kernel
> > tree.  Not to mention I'd like to see this expand to also run some
> > deqp and/or piglit tests, which is definitely too much to vendor into
> > the kernel tree.
> >
> > The approach of this RFC was to put only what was absolutely required
> > in the kernel tree (such as expectations), and then link out to an
> > external drm-ci tree[1] which has all the necessary scripts and yml
> > for building and running tests, to avoid having to put a whole lot
> > more in the kernel tree. (We should be specifying exact commit-sha for
> > that tree, IMO, as it controls the version of i-g-t which gets used,
> > and we need to be able to update expectations in sync with an i-g-t
> > uprev, for example when new tests are added or if a test fix caused a
> > fail->pass transition.)
>
> Yeah I think longer-term we should carry a lot more in upstream, at
> least anything that's shared across drivers wrt the ci integration (or
> build testing and running tests which are hw agnostic). Maybe even
> igt, not sure (otoh xfs-tests isn't moving into the kernel either, and
> there's lots more like that).

A lot of the drm-ci tree is the scripts/etc for things like power
control, booting, etc.. and a lot of that is identical to what we have
in the mesa tree (since the on-hw tests run on the same CI farms as
mesa-ci)

But ofc it can be re-used by other drivers via one line in toplevel
$driver/ci/gitlab-ci.yml, ie:

  DRM_CI_PROJECT_PATH: &drm-ci-project-path gfx-ci/drm-ci

BR,
-R

> Personally I think long-term the only thing outside should be other
> repos with tests or stuff you need to run them, and not really the
> glue to make it all work in ci. But that's maybe a bit too much
> wishful thinking if CI systems stay largely subsystem specific (which
> they currently are in many ways, with some overlap).
>
> But maybe there is enough random pieces to share here for a lot more
> in-tree to make sense, and imo the fewer extra steps and indirection
> CI testing and test updating has, the better.
>
> But like Rob says, eventually there's a limit and when you put the
> entire GL/vulkan stack + it's conformance testsuite (which is
> maintained by khronos somewhere completely different than both
> kernel.org and freedesktop.org) then it's definitely too much and wont
> work. And eventually we do want to run these things too (e.g.
> intel-gfx-ci does run mesa + piglit on every run).
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch