Re: [PATCH 1/3] MAINTAINERS: Introduce V: field for required tests

Mark Brown <broonie@xxxxxxxxxx> · Mon, 20 Nov 2023 22:27:33 +0000

On Mon, Nov 20, 2023 at 03:51:31PM -0500, Theodore Ts'o wrote:

> What we have at work is a way to upload the test results summary
> (e.g., just KTAP result lines, or the xfstests junit XML) along with
> test run metadata (e.g., what was the kernel commit on which the test
> was run, and the test hardware), and this would be stored permanently.
> Test artifacts is also preserved but for a limited amount of time
> (e.g., some number of months or a year).

> The difference in storage lifetimes is because the junit XML file
> might be a few kilobytes to tens of kilobytes. but the test artifacts
> might be a few megabytes to tens of megabytes.

This is the sort of thing that kcidb (which Nikolai works on) is good at
ingesting, I actually do push all my CI's test results into there
already:

   https://github.com/kernelci/kcidb/

(the dashboard is down currently.)  A few other projects including the
current KernelCI and RedHat's CKI push their data in there too, I'm sure
Nikolai would be delighted to get more people pushing data in.  The goal
is to merge this with the main KernelCI infrastructure, it's currently
separate while people figure out the whole big data thing.

> Of course once you have this data, it becomes possible to detect when
> a test may have regressed, or to detect flaky tests, and perhaps to
> figure out if certain hardware configurations or kernel configurations
> are more likely to trigger a particular test to fail.  So having all
> of this data stored centrally would be really cool.  The only question
> is who might be able to create such an infrastructure, and be able to
> pay for the ongoing development and operational costs....

The KernelCI LF project is funding kcidb with precisely this goal for
the reasons you outline, the data collection part seems to be relatively
mature at this point but AIUI there's a bunch of open questions with the
analysis and usage side, partly due to needing to find people to work on
it.  My understanding is that ingesting large data sets into cloud
providers is pretty tractable, as with a lot of this stuff it gets more
interesting trying to pull the data out and comprehend it in a practical
fashion.  It'd be really cool to see more people working on that side of
things.

On the submission side it'd be interesting to start collecting more data
about the test systems used to run things, might be useful to add a new
schema for that which can be referenced from the test schema.
Attachment:
signature.asc

Description: PGP signature