RE: RFC - kernel selftest result documentation (KTAP)

"Bird, Tim" <Tim.Bird@xxxxxxxx> · Mon, 15 Jun 2020 17:45:28 +0000

> -----Original Message-----
> From: Kees Cook
> Sent: Sunday, June 14, 2020 12:18 PM
> To: David Gow <davidgow@xxxxxxxxxx>
> Cc: Vitor Massaru Iha <vitor@xxxxxxxxxxx>; KUnit Development <kunit-dev@xxxxxxxxxxxxxxxx>; Shuah Khan
> <skhan@xxxxxxxxxxxxxxxxxxx>; open list:KERNEL SELFTEST FRAMEWORK <linux-kselftest@xxxxxxxxxxxxxxx>; Linux Kernel Mailing List
> <linux-kernel@xxxxxxxxxxxxxxx>; Brendan Higgins <brendanhiggins@xxxxxxxxxx>; linux-kernel-mentees@xxxxxxxxxxxxxxxxxxxxxxxxx;
> linux@xxxxxxxxxxxxxxxxxx
> Subject: Re: RFC - kernel selftest result documentation (KTAP)
> 
> On Sat, Jun 13, 2020 at 02:51:17PM +0800, David Gow wrote:
> > On Sat, Jun 13, 2020 at 6:36 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > > Regarding output:
> > >
> > > [   36.611358] TAP version 14
> > > [   36.611953]     # Subtest: overflow
> > > [   36.611954]     1..3
> > > ...
> > > [   36.622914]     # overflow_calculation_test: s64: 21 arithmetic tests
> > > [   36.624020]     ok 1 - overflow_calculation_test
> > > ...
> > > [   36.731096]     # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > > [   36.731840]     ok 2 - overflow_shift_test
> > > ...
> > > [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> > > ...
> > > [   36.805350]     # overflow_allocation_test: devm_kzalloc detected saturation
> > > [   36.807763]     ok 3 - overflow_allocation_test
> > > [   36.807765] ok 1 - overflow
> > >
> > > A few things here....
> >
> > Tim Bird has just sent out an RFC for a "KTAP" specification, which
> > we'll hope to support in KUnit:
> 
> Ah-ha! Thanks for the heads-up. :)
> 
> > https://lore.kernel.org/linux-
> kselftest/CY4PR13MB1175B804E31E502221BC8163FD830@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/T/#u
> 
> *thread split/merge*
> 
> This is coming from:
> https://lore.kernel.org/linux-kselftest/CABVgOSnofuJQ_fiCL-8KdKezg3Hnqk3A+X509c4YP_toKeBVBg@xxxxxxxxxxxxxx/
> But I'm attempting a thread jump... ;)
> 
> > That's probably where we'll end up trying to hash out exactly what
> > this format should be. Fortunately, I don't think any of these will
> > require any per-test work, just changes to the KUnit implementation.
> 
> Yup, good.
> 
> > > - On the outer test report, there is no "plan" line (I was expecting
> > >   "1..1"). Technically it's optional, but it seems like the information
> > >   is available. :)
> >
> > There's work underway to support this, but it's hit a few minor snags:
> > https://lkml.org/lkml/2020/2/27/2155
> 
> Okay, cool. It's not critical, I don't think.
> 
> > > - The subtest should have its own "TAP version 14" line, and it should
> > >   be using the diagnostic line prefix for the top-level test (this is
> > >   what kselftest is doing).
> >
> > Alas, TAP itself hasn't standardised subtests. Personally, I think it
> > doesn't fundamentally matter which way we do this (I actually prefer
> > the way we're doing it currently slightly), but converging with what
> > kselftest does would be ideal.
> 
> I see the KTAP RFC doesn't discuss subtests at all, but kselftest actually
> already handles subtests. I strongly feel that line-start formatting is
> the correct way to deal with this, with each subtest having it's own
> self-contained KTAP. This allows for several important features:
> 
> - the subtest, run on its own, needs no knowledge about its execution
>   environment: it simply emits its standard KTAP output.
> 
> - subtest output can be externally parsed separably, without any
>   knowledge or parsing of the enclosing test.

I agree with both of these, but will save additional comments for the
thread on my KTAP doc RFC.

> 
> For example, with my recent series[1], "make -C seccomp run_tests"
> produces:
> 
> TAP version 13
> 1..2
> # selftests: seccomp: seccomp_bpf
> # TAP version 13
> # 1..77
> # # Starting 77 tests from 6 test cases.
> # #  RUN           global.mode_strict_support ...
> # #            OK  global.mode_strict_support
> # ok 1 global.mode_strict_support
> ...
> # ok 77 TSYNC.two_siblings_not_under_filter
> # # FAILED: 73 / 77 tests passed.
> # # Totals: pass:72 fail:4 xfail:1 xpass:0 skip:0 error:0
> not ok 1 selftests: seccomp: seccomp_bpf # exit=1
> # selftests: seccomp: seccomp_benchmark
> #
> not ok 2 selftests: seccomp: seccomp_benchmark # TIMEOUT
> 
> > > - There is no way to distinguish top-level TAP output from kernel log
> > >   lines. I think we should stick with the existing marker, which is
> > >   "# ", so that kernel output has no way to be interpreted as TAP
> > >   details -- unless it intentionally starts adding "#"s. ;)
> >
> > At the moment, we're doing this in KUnit tool by stripping anything
> > before "TAP version 14" (e.g., the timestamp), and then only incuding
> > lines which parse correctly (are a test plan, result, or a diagnostic
> > line beginning with '#').
> > This has worked pretty well thus far, and we do have the ability to
> > get results from debugfs instead of the kernel log, which won't have
> > the same problems.
> >
> > It's worth considering, though, particularly since our parser should
> > handle this anyway without any changes.
> 
> For the KTAP parsing, actually think it's very important to include
> kernel log lines in the test output (as diagnostic lines), since they
> are "unexpected" (they fail to have the correct indentation) and may
> provide additional context for test failures when reading log files.
> (As in the "vmalloc: allocation failure" line in the quoted section
> above, to be included as a diagnostic line for test #3.)
> 
> > > - There is no summary line (to help humans). For example, the kselftest
> > >   API produces a final pass/fail report.
> >
> > Currently, we're relying on the kunit.py script to produce this, but
> > it shouldn't be impossible to add to the kernel, particularly once the
> > "KUnit Executor" changes mentioned above land.
> 
> Cool. Yeah, it's not required, but I think there are two use cases:
> humans running a single test (where is a summary is valuable, especially
> for long tests that scroll off the screen), and automation (where it can
> ignore the summary, as it will produce its own in a regularized fashion).
> 
> > > Taken together, I was expecting the output to be:
> > >
> > > [   36.611358] # TAP version 14
> > > [   36.611953] # 1..1
> > > [   36.611958] # # TAP version 14
> > > [   36.611954] # # 1..3
> > > ...
> > > [   36.622914] # # # overflow_calculation_test: s64: 21 arithmetic tests
> > > [   36.624020] # # ok 1 - overflow_calculation_test
> > > ...
> > > [   36.731096] # # # overflow_shift_test: ok: (s64)(0 << 63) == 0
> > > [   36.731840] # # ok 2 - overflow_shift_test
> > > ...
> > > [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> > > ...
> > > [   36.805350] # # # overflow_allocation_test: devm_kzalloc detected saturation
> > > [   36.807763] # # ok 3 - overflow_allocation_test
> > > [   36.807763] # # # overflow: PASS
> > > [   36.807765] # ok 1 - overflow
> > > [   36.807767] # # kunit: PASS
> > >
> > > But I assume there are threads on this that I've not read... :)
> >
> > These discussions all seem to be coming to a head now, so this is
> > probably just the kick we need to prioritise this a bit more. The KTAP
> > thread hasn't covered all of these (particularly the subtest stuff)
> > yet, so I confess I hadn't realised the extent of the divergence
> > between KUnit and kselftest here.
> 
> Cool. Yeah, I'd like to keep things as close as possible. In looking at
> this again, I wonder if perhaps it would be better to change the
> "indent" from "# " to "  ", which might make things more readable for
> both dmesg and terminal output:
> 
> [   36.611358]   TAP version 14
> [   36.611953]   1..1
> [   36.611958]     TAP version 14
> [   36.611954]     1..3
> ...
> [   36.622914]     # overflow_calculation_test: s64: 21 arithmetic tests
> [   36.624020]     ok 1 - overflow_calculation_test
> ...
> [   36.731096]     # overflow_shift_test: ok: (s64)(0 << 63) == 0
> [   36.731840]     ok 2 - overflow_shift_test
> ...
> [   36.750294] kunit_try_catch: vmalloc: allocation failure: 18446744073709551615 bytes, mode:0xcc0(GFP_KERNEL),
> nodemask=(null),cpuset=/,mems_allowed=0
> ...
> [   36.805350]     # overflow_allocation_test: devm_kzalloc detected saturation
> [   36.807763]     ok 3 - overflow_allocation_test
> [   36.807763]     # overflow: PASS
> [   36.807765]   ok 1 - overflow
> [   36.807767]   # kunit: PASS
> 
> As it happens, subtests are a bit rare in kselftests right now, but
> for kunit, the "common" output (IIUC) will fundamentally be a top-level
> test running all the subtests, so we should get it right. :)

Personally, as a human I find the space prefix a bit easier to read.
However, I think that in "normal" kernel log output it is unusual for
a line to be prefixed with a hash (#), so this might be easier to
both visually distinguish and for automated parsing.
So I'm torn.  I don't have a strong opinion on space vs. hash prefix
for indicating sub-test.  I think the KUnit convention of matching
whatever was the prefix of the "TAP version 14" line is clever, but
it would be nice to just pick a prefix and be done with it.

IMHO.
 -- Tim

> -Kees
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=selftests/harness/tap
> 
> --
> Kees Cook