Re: RFC - kernel selftest result documentation (KTAP)

Brendan Higgins <brendanhiggins@xxxxxxxxxx> · Fri, 19 Jun 2020 12:39:53 -0700

On Tue, Jun 16, 2020 at 2:16 PM Bird, Tim <Tim.Bird@xxxxxxxx> wrote:
>
>
>
> > -----Original Message-----
> > From: Brendan Higgins
> >
> > On Wed, Jun 10, 2020 at 06:11:06PM +0000, Bird, Tim wrote:
> > > Some months ago I started work on a document to formalize how
> > > kselftest implements the TAP specification.  However, I didn't finish
> > > that work.  Maybe it's time to do so now.
> > >
> > > kselftest has developed a few differences from the original
> > > TAP specification, and  some extensions that I believe are worth
> > > documenting.
> > >
> > > Essentially, we have created our own KTAP (kernel TAP)
> > > format.  I think it is worth documenting our conventions, in order to
> > > keep everyone on the same page.
> > >
> > > Below is a partially completed document on my understanding
> > > of KTAP, based on examination of some of the kselftest test
> > > output.  I have not reconciled this with the kunit output format,
> > > which I believe has some differences (which maybe we should
> > > resolve before we get too far into this).
> > >
> > > I submit the document now, before it is finished, because a patch
> > > was recently introduced to alter one of the result conventions
> > > (from SKIP='not ok' to SKIP='ok').
> > >
> > > See the document include inline below
> > >
> > > ====== start of ktap-doc-rfc.txt ======
> >
> > [...]
> >
> > > --- from here on is not-yet-organized material
> > >
> > > Tip:
> > >  - don't change the test plan based on skipped tests.
> > >    - it is better to report that a test case was skipped, than to
> > >      not report it
> > >    - that is, don't adjust the number of test cases based on skipped
> > >      tests
> > >
> > > Other things to mention:
> > > TAP13 elements not used:
> > >  - yaml for diagnostic messages
> >
> > We talked about this before, but I would like some way to get failed
> > expectation/assertion information in the test in a consistent machine
> > parsible way. Currently we do the following:
> >
> >     # Subtest: example
> >     1..1
> >     # example_simple_test: initializing
> >     # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
> >     Expected 1 + 1 == 3, but
> >         1 + 1 == 2
> >         3 == 3
> >     not ok 1 - example_simple_test
> > not ok 5 - example
> >
> > Technically not TAP compliant, but no one seems to mind. I am okay with
> > keeping it the way it is, but if we don't want it in the KTAP spec, we
> > will need some kind of recourse.
>
> So far, most of the CI systems don't parse out diagnostic data, so it doesn't
> really matter what the format is.  If it's useful for humans, it's valuable as is.
> However, it would be nice if that could change.  But without some formalization
> of the format of the diagnostic data, it's an intractable problem for CI systems
> to parse it.  So it's really a chicken and egg problem.  To solve it, we would have
> to determine what exactly needs to be provided on a consistent basis for diagnostic
> data across many tests.  I think that it's too big a problem to handle right now.
> I'm not opposed to migrating to some structure with yaml in the future, but free
> form text output seems OK for now.

Well as long as everyone is cool with it for now we can put the
problem for later.

> > >    - reason: try to keep things line-based, since output from other things
> > >    may be interspersed with messages from the test itself
> > >  - TODO directive
> >
> > Is this more of stating a fact or desire? We don't use TODO either, but
> > it looks like it could be useful.
> Just stating a fact.  I didn't find TODO in either KUnit or selftest in
> November when I initially wrote this up.  If TODO serves as a kind
> of XFAIL, it could be useful.  I have nothing against it.

Fair enough.

> > > KTAP Extensions beyond TAP13:
> > >  - nesting
> > >    - via indentation
> > >      - indentation makes it easier for humans to read
> > >  - test identifier
> > >     - multiple parts, separated by ':'
> >
> > Can you elabroate on this more? I am not sure what you mean.
> An individual test case can have a name that is scoped by a containing
> test or test suite.  For example: selftests: cpufreq: main.sh
> This test identifier consists of the test system (selftests), the test
> area (cpufreq), and the test case name (main.sh).  This one's a bit
> weird because the test case name is just the name of the program
> in that test area.  The program itself doesn't output data in TAP format,
> and the harness uses it's exit code to detect PASS/FAIL.  if main.sh had
> multiple test cases, it might produce test identifiers like this:
> selftests: cpufreq: main: check_change_afinity_mask
> selftests: cpufreq: main: check_permissions_for_mask_operation
> (Or it might just produce the last part of these strings, the
> testcase names, and the testcase id might be something generated
> by the harness or CI system.)

+Alan Maguire

Aha, that is something we (Alan, David, Kees, and myself) were talking
about on another thread:

https://lore.kernel.org/linux-kselftest/CABVgOSnjrzfFOMF0VE1-5Ks-e40fc0XZsNZ92jE60ZOhYmZWog@xxxxxxxxxxxxxx/T/#m682be9f9103f7b363b702e49c137d83a4833fcae

I think that makes a lot of sense if it isn't too hard in practice.

> The value of having a single string to identify the testcase (like a
> uniform resource locator), is that it's easier to use the string to
> correlate results produced from different CI system that are executing
> the same test.

Makes sense.

> > >  - summary lines
> > >    - can be skipped by CI systems that do their own calculations
> > >
> > > Other notes:
> > >  - automatic assignment of result status based on exit code
> > >
> > > Tips:
> > >  - do NOT describe the result in the test line
> > >    - the test case description should be the same whether the test
> > >      succeeds or fails
> > >    - use diagnostic lines to describe or explain results, if this is
> > >      desirable
> > >  - test numbers are considered harmful
> > >    - test harnesses should use the test description as the identifier
> > >    - test numbers change when testcases are added or removed
> > >      - which means that results can't be compared between different
> > >        versions of the test
> > >  - recommendations for diagnostic messages:
> > >    - reason for failure
> > >    - reason for skip
> > >    - diagnostic data should always preceding the result line
> > >      - problem: harness may emit result before test can do assessment
> > >        to determine reason for result
> > >      - this is what the kernel uses
> > >
> > > Differences between kernel test result format and TAP13:
> > >  - in KTAP the "# SKIP" directive is placed after the description on
> > >    the test result line
> > >
> > > ====== start of ktap-doc-rfc.txt ======
> > > OK - that's the end of the RFC doc.
> > >
> > > Here are a few questions:
> > >  - is this document desired or not?
> > >     - is it too long or too short?
> > >  - if the document is desired, where should it be placed?
> >
> > I like it. I don't think we can rely on the TAP people updating their
> > stuff based on my interactions with them. So having a spec which is
> > actually maintained would be nice.
> >
> > Maybe in Documentation/dev-tools/ ?
> I'm leaning towards Documentation/dev-tools/test-results_format.rst

SGTM.

> > >    I assume somewhere under Documentation, and put into
> > >    .rst format. Suggestions for a name and location are welcome.
> > >  - is this document accurate?
> > >    I think KUNIT does a few things differently than this description.
> > >    - is the intent to have kunit and kselftest have the same output format?
> > >       if so, then these should be rationalized.
> >
> > Yeah, I think it would be nice if all test frameworks/libraries for the
> > kernel output tests in the same language.
> Agreed.

Cheers