Re: [RFC] Test catalog template

David Gow <davidgow@xxxxxxxxxx> · Sat, 19 Oct 2024 14:36:22 +0800

On Sat, 19 Oct 2024 at 04:17, Donald Zickus <dzickus@xxxxxxxxxx> wrote:
>
> On Fri, Oct 18, 2024 at 3:22 AM David Gow <davidgow@xxxxxxxxxx> wrote:
> >
> > Hi Don,
> >
> > Thanks for putting this together: the discussion at Plumbers was very useful.
> >
> > On Tue, 15 Oct 2024 at 04:33, Donald Zickus <dzickus@xxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > At Linux Plumbers, a few dozen of us gathered together to discuss how
> > > to expose what tests subsystem maintainers would like to run for every
> > > patch submitted or when CI runs tests.  We agreed on a mock up of a
> > > yaml template to start gathering info.  The yaml file could be
> > > temporarily stored on kernelci.org until a more permanent home could
> > > be found.  Attached is a template to start the conversation.
> > >
> >
> > I think that there are two (maybe three) separate problems here:
> > 1. What tests do we want to run (for a given patch/subsystem/environment/etc)?
>
> My thinking is this is maintainer's choice.  What would they like to
> see run on a patch to verify its correctness?  I would like to think
> most maintainers already have scripts they run before commiting
> patches to their -next branch.  All I am trying to do is expose what
> is already being done I believe.
>

Agreed.

>
> > 2. How do we describe those tests in such a way that running them can
> > be automated?
>
> This is the tricky part.  But I am going to assume that if most
> maintainers run tests before committing patches to their -next branch,
> then there is a good chance those tests are scripted and command line
> driven (this is the kernel community, right :-) ).  So if we could
> expose those scripts and make the copy-and-pastable such that
> contributors or testers (including CI bots) can just copy and run
> them.  Some maintainers have more complex environments and separating
> command line driven tests from the environment scripts might be
> tricky.
>
> Does that sound reasonable?

Yeah: that's basically what I'd want.

>
> > 3. (Exactly what constitutes a 'test'? A single 'test', a whole suite
> > of tests, a test framework/tool? What about the environment: is, e.g.,
> > KUnit on UML different from KUnit on qemu-x86_64 different from KUnit
> > on qemu-arm64?)
> >
> > My gut feeling here is that (1) is technically quite easy: worst-case
> > we just make every MAINTAINERS entry link to a document describing
> > what tests should be run. Actually getting people to write these
> > documents and then run the tests, though, is very difficult.
>
> Well if I look at kunit or kselftest, really all you are doing as a
> subsystem maintainer is asking contributors or testers to run a 'make'
> command right?  Everything else is already documented I think.
>
> >
> > (2) is the area where I think this will be most useful. We have some
> > arbitrary (probably .yaml) file which describes a series of tests to
> > run in enough detail that we can automate it. My ideal outcome here
> > would be to have a 'kunit.yaml' file which I can pass to a tool
> > (either locally or automatically on some CI system) which will run all
> > of the checks I'd run on an incoming patch. This would include
> > everything from checkpatch, to test builds, to running KUnit tests and
> > other test scripts. Ideally, it'd even run these across a bunch of
> > different environments (architectures, emulators, hardware, etc) to
> > catch issues which only show up on big-endian or 32-bit machines.
> >
> > If this means I can publish that yaml file somewhere, and not only
> > give contributors a way to check that those tests pass on their own
> > machine before sending a patch out, but also have CI systems
> > automatically run them (so the results are ready waiting before I
> > manually review the patch), that'd be ideal.
>
> Yes, that is exactly the goal of this exercise.  :-) but instead of a
> kunit.yaml file, it is more of a test.yaml file with hundreds of
> subystems inside it (and probably a corresponding get_tests.pl
> script)[think how MAINTAINERS file operates and this is a sister
> file].
>
> Inside the 'KUNIT' section would be a container of tests that would be
> expected to run (like you listed).  Each test has its own command line
> and params.
>

Yeah. My hope is we can have a "run_tests" tool which parses that
file/files and runs everything.

So whether that ends up being:
run_tests --subsystem "KUNIT" --subsystem "MM"
or
run_test --file "kunit.yaml" --file "mm.yaml"
or even
run_test --patch "my_mm_and_kunit_change.patch"

A CI system can just run it against the changed files in patches, a
user who wants to double check something specific can override it to
force the tests for a subsystem which may be indirectly affected. And
if you're working on some new tests, or some private internal ones,
you can keep your own yaml file and pass that along too.

> >
> > > Longer story.
> > >
> > > The current problem is CI systems are not unanimous about what tests
> > > they run on submitted patches or git branches.  This makes it
> > > difficult to figure out why a test failed or how to reproduce.
> > > Further, it isn't always clear what tests a normal contributor should
> > > run before posting patches.
> > >
> > > It has been long communicated that the tests LTP, xfstest and/or
> > > kselftests should be the tests  to run.  However, not all maintainers
> > > use those tests for their subsystems.  I am hoping to either capture
> > > those tests or find ways to convince them to add their tests to the
> > > preferred locations.
> > >
> > > The goal is for a given subsystem (defined in MAINTAINERS), define a
> > > set of tests that should be run for any contributions to that
> > > subsystem.  The hope is the collective CI results can be triaged
> > > collectively (because they are related) and even have the numerous
> > > flakes waived collectively  (same reason) improving the ability to
> > > find and debug new test failures.  Because the tests and process are
> > > known, having a human help debug any failures becomes easier.
> > >
> > > The plan is to put together a minimal yaml template that gets us going
> > > (even if it is not optimized yet) and aim for about a dozen or so
> > > subsystems.  At that point we should have enough feedback to promote
> > > this more seriously and talk optimizations.
> > >
> > > Feedback encouraged.
> > >
> > > Cheers,
> > > Don
> > >
> > > ---
> > > # List of tests by subsystem
> >
> > I think we should split this up into several files, partly to avoid
> > merge conflicts, partly to make it easy to maintain custom collections
> > of tests separately.
> >
> > For example, fs.yaml could contain entries for both xfstests and fs
> > KUnit and selftests.
>
> I am not opposed to the idea.  But I am a fan of the user experience.
> So while an fs.yaml might sound good, is it obvious to a contributor
> or tester that given a patch, do they know if fs.yaml is the correct
> yaml file to parse when running tests?  How do you map a patch to a
> yaml file?  I was trying to use subsystems like MAINTAINERS (and
> get_maintainers.pl) as my mapping.  Open to better suggestions.
>

One option would be to have multiple files, which still have the
MAINTAINERS subsystems listed within, and worst-case a tool just
parses all of the files in that directory until it finds a matching
one. Maybe a bit slower than having everything in the one file, but it
sidesteps merge conflicts well.

But ideally, I'd like (as mentioned below) to have a tool which I can
use to run tests locally, and being able to run, e.g.,
./run_tests --all -f fs.yaml
If I want to specify the tests I want to run manually, personally I
think a filename would be a bit nicer than having to pass through,
e.g., subsystem names.

> >
> > It's also probably going to be necessary to have separate sets of
> > tests for different use-cases. For example, there might be a smaller,
> > quicker set of tests to run on every patch, and a much longer, more
> > expensive set which only runs every other day. So I don't think
> > there'll even be a 1:1 mapping between 'test collections' (files) and
> > subsystems. But an automated way of running "this collection of tests"
> > would be very useful, particularly if it's more user-friendly than
> > just writing a shell script (e.g., having nicely formatted output,
> > being able to run things in parallel or remotely, etc).
>
> I don't disagree.  I am trying to start small to get things going and
> some momentum.  I proposed a container of tests section.  I would like
> to think adding another field in each individual test area like
> (short, medium, long OR mandatory, performance, nice-to-have) would be
> easy to add to the yaml file overall and attempt to accomplish what
> you are suggesting.  Thoughts?
>

I think that'd be a great idea. Maybe a "stage" field could work, too,
where later tests only run if the previous ones pass. For example:
Stage 0: checkpatch, does it build
Stage 1: KUnit tests, unit tests, single architecture
Stage 2: Full boot tests, selftests, etc
Stage 3: The above tests on other architectures, allyesconfig, randconfig, etc.

Regardless, it'd be useful to be able to name individual tests and/or
configurations and manually trigger them and/or filter on them.

_Maybe_ it makes sense to split up the "what tests to run" and "how
are they run" bits. The obvious split here would be to have the test
catalogue just handle the former, and the "how they're run" bit
entirely live in shell scripts. But if we're going to support running
tests in parallel and nicely displaying results, maybe there'll be a
need to have something more data driven than a shell script.

> >
> > > #
> > > # Tests should adhere to KTAP definitions for results
> > > #
> > > # Description of section entries
> > > #
> > > #  maintainer:    test maintainer - name <email>
> > > #  list:                mailing list for discussion
> > > #  version:         stable version of the test
> > > #  dependency: necessary distro package for testing
> > > #  test:
> > > #    path:            internal git path or url to fetch from
> > > #    cmd:            command to run; ability to run locally
> > > #    param:         additional param necessary to run test
> > > #  hardware:      hardware necessary for validation
> > > #
> > > # Subsystems (alphabetical)
> > >
> > > KUNIT TEST:
> >
> > For KUnit, it'll be interesting to draw the distinction between KUnit
> > overall and individual KUnit suites.
> > I'd lean towards having a separate entry for each subsystem's KUnit
> > tests (including one for KUnit's own tests)
>
> KUNIT may not have been the best 'common' test example due to its
> complexities across other subsystems. :-/
>

Yeah: I think KUnit tests are a good example of the sorts of tests
which would be relatively easy to integrate, but KUnit as a subsystem
can be a bit confusing as an example because no-one's sure if we're
talking about KUnit-the-subsystem or KUnit-the-tests.

> >
> > >   maintainer:
> > >     - name: name1
> > >       email: email1
> > >     - name: name2
> > >       email: email2
> > >   list:
> >
> > How important is it to have these in the case where they're already in
> > the MAINTAINERS file? I can see it being important for tests which
> > live elsewhere, though eventually, I'd still prefer the subsystem
> > maintainer to take some responsibility for the tests run for their
> > subsystems.
>
> I wasn't sure if all subsystem maintainers actually want to maintain
> the tests too or just point someone else at it.  I look at LTP as an
> example here.  But I could be wrong.
>

Fair enough. Maybe we just make this optional, and if empty we
"default" to the subsystem maintainer.

> >
> > >   version:
> >
> > This field is probably unnecessary for test frameworks which live in
> > the kernel tree.
>
> Possibly.  It was brought up at Plumbers, so I included it for completeness.
>

Yeah. Again, good to have, but make it optional.

> >
> > >   dependency:
> > >     - dep1
> > >     - dep2
> >
> > If we want to automate this in any way, we're going to need to work
> > out a way of specifying these. Either we'd have to pick a distro's
> > package names, or have our own mapping.
>
> Agreed.  I might lean on what 'perf' outputs.  They do dependency
> detection and output suggested missing packages.  Their auto detection
> of already included deps is rather complicated though.
>

Sounds good.

> >
> > (A part of me really likes the idea of having a small list of "known"
> > dependencies: python, docker, etc, and trying to limit tests to using
> > those dependencies. Though there are plenty of useful tests with more
> > complicated dependencies, so that probably won't fly forever.)
>
> Hehe.  For Fedora/RHEL at least, python has hundreds of smaller
> library packages.  That is tricky.  And further some tests like to
> compile, which means a bunch of -devel packages.  Each distro has
> different names for their -devel packages. :-/
> But a side goal of this effort is to define some community standards.
> Perhaps we can influence things here to clean up this problem??
>

That'd be nice. :-)

> >
> > >   test:
> > >     - path: tools/testing/kunit
> > >       cmd:
> > >       param:
> > >     - path:
> > >       cmd:
> > >       param:
> >
> > Is 'path' here supposed to be the path to the test binary, the working
> > directory, etc?
> > Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.
>
> The thought was the command to copy-n-paste to run the test after
> installing it.  I am thinking most tests might be a git-clone or
> exploded tarball, leaving the path to be from the install point.  So
> maybe working_directory is more descriptive.
>

Sounds good. In the KUnit case, the tooling currently expects the
working directory to be the root of the kernel checkout, and the
command to be "./tools/testing/kunit/kunit.py"...

> >
> > >   hardware: none
> >
> >
> >
> > For KUnit, I'd imagine having a kunit.yaml, with something like this,
> > including the KUnit tests in the 'kunit' and 'example' suites, and the
> > 'kunit_tool_test.py' test script:
> >
> > ---
> > KUnit:
> >   maintainer:
> >     - name: David Gow
> >       email: davidgow@xxxxxxxxxx
> >     - name: Brendan Higgins
> >       email: brendan.higgins@xxxxxxxxx
> >   list: kunit-dev@xxxxxxxxxxxxxxxx
> >   dependency:
> >     - python3
> >   test:
> >     - path: .
> >       cmd: tools/testing/kunit.py
> >       param: run kunit
> >     - path: .
> >       cmd: tools/testing/kunit.py
> >       param: run example
> >   hardware: none
> > KUnit Tool:
> >   maintainer:
> >     - name: David Gow
> >       email: davidgow@xxxxxxxxxx
> >     - name: Brendan Higgins
> >       email: brendan.higgins@xxxxxxxxx
> >   list: kunit-dev@xxxxxxxxxxxxxxxx
> >   dependency:
> >     - python3
> >   test:
> >     - path: .
> >       cmd: tools/testing/kunit_tool_test.py
> >       param:
> >   hardware: none
> > ---
> >
> > Obviously there's still some redundancy there, and I've not actually
> > tried implementing something that could run it. It also lacks any
> > information about the environment. In practice, I have about 20
> > different kunit.py invocations which run the tests with different
> > configs and on different architectures. Though that might make sense
> > to keep in a separate file to only run if the simpler tests pass. And
> > equally, it'd be nice to have a 'common.yaml' file with basic patch
> > and build tests which apply to almost everything (checkpatch, make
> > defconfig, maybe even make allmodconfig, etc).
>
> Nice, thanks for the more detailed example.
>

Thanks,
-- David
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature