RE: RFC - kernel selftest result documentation (KTAP)

"Bird, Tim" <Tim.Bird@xxxxxxxx> · Fri, 19 Jun 2020 23:47:02 +0000

Just a quick note that there's been a lot of good discussion.

I have an updated draft of the document, but I need to review
the flurry of comments today, and I'm busy getting my slides
ready for a conference.  So I just wanted to give a heads up
that I'll be working on this (responding to comments and
hopefully posting an updated draft version) early next week.

Thanks for the feedback.
 -- Tim

> -----Original Message-----
> From: Frank Rowand <frowand.list@xxxxxxxxx>
> 
> On 2020-06-16 23:05, David Gow wrote:
> > On Wed, Jun 17, 2020 at 11:36 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> >>
> >> On Wed, Jun 17, 2020 at 02:30:45AM +0000, Bird, Tim wrote:
> >>> Agreed.  You only need machine-parsable data if you expect the CI
> >>> system to do something more with the data than just present it.
> >>> What that would be, that would be common for all tests (or at least
> >>> many test), is unclear.  Maybe there are patterns in the diagnostic
> >>> data that could lead to higher-level analysis, or even automated
> >>> fixes, that don't become apparent if the data is unstructured.  But
> >>> it's hard to know until you have lots of data.  I think just getting
> >>> the other things consistent is a good priority right now.
> >>
> >> Yeah. I think the main place for this is performance analysis, but I
> >> think that's a separate system entirely. TAP is really strictly yes/no,
> >> where as performance analysis a whole other thing. The only other thing
> >> I can think of is some kind of feature analysis, but that would be built
> >> out of the standard yes/no output. i.e. if I create a test that checks
> >> for specific security mitigation features (*cough*LKDTM*cough*), having
> >> a dashboard that shows features down one axis and architectures and/or
> >> kernel versions on other axes, then I get a pretty picture. But it's
> >> still being built out of the yes/no info.
> >>
> >> *shrug*
> >>
> >> I think diagnostic should be expressly non-machine-oriented.
> >
> > So from the KUnit side, we sort-of have three kinds of diagnostic lines:
> > - Lines printed directly from tests (typically using kunit_info() or
> > similar functions): as I understand it, these are basically the
> > equivalent of what kselftest typically uses diagnostics for --
> > test-specific, human-readable messages. I don't think we need/want to
> > parse these much.
> 
> 
> > - Kernel messages during test execution. If we get the results from
> > scraping the kernel log (which is still the default for KUnit, though
> > there is also a debugfs info), other kernel logs can be interleaved
> > with the results. Sometimes these are irrelevant things happening on
> > another thread, sometimes they're something directly related to the
> > test which we'd like to capture (KASAN errors, for instance). I don't
> > think we want these to be machine oriented, but we may want to be able
> > to filter them out.
> 
> This is an important conceptual difference between testing a user
> space program (which is the environment that TAP initially was
> created for) and testing kernel code.  This difference should be
> addressed in the KTAP standard.  As noted above, a kernel test
> case may call into other kernel code, where the other kernel code
> generates messages that get into the test output.
> 
> One issue with the kernel issues is that they may be warnings or
> errors, and to anyone other than the test creator it is probably
> hard to determine whether the warnings and errors are reporting
> bugs or whether they are expected results triggered by the test.
> 
> I created a solution to report what error(s) were expected for a
> test, and a tool to validate whether the error(s) occurred or not.
> This is currently in the devicetree unittests, but the exact
> implementation should be discussed in the KUnit context, and it
> should be included in the KTAP spec.
> 
> I can describe the current implementation and start a discussion
> of any issues in this thread or I can start a new thread.  Whichever
> seems appropriate to everyone.
> 
> -Frank
> 
> 
> > - Expectation failures: as Brendan mentioned, KUnit will print some
> > diagnostic messages for individual assertion/expectation failures,
> > including the expected and actual values. We'd ideally like to be able
> > to identify and parse these, but keeping them human-readable is
> > definitely also a goal.
> >
> > Now, to be honest, I doubt that the distinction here would be of much
> > use to kselftest, but it could be nice to not go out of our way to
> > make parsing some diagnostic lines possible. That being said,
> > personally I'm all for avoiding the yaml for diagnostic messages stuff
> > and sticking to something simple and line-based, possibly
> > standardising a the format of a few common diagnostic measurements
> > (e.g., assertions/expected values/etc) in a way that's both
> > human-readable and parsable if possible.
> >
> > I agree that there's a lot of analysis that is possible with just the
> > yes/no data. There's probably some fancy correlation one could do even
> > with unstructured diagnostic logs, so I don't think overstructuring
> > things is a necessity by any means. Where we have different tests
> > doing similar sorts of things, though, consistency in message
> > formatting could help even if things are not explicitly parsed.
> > Ensuring that helper functions that log and the like are spitting
> > things out in the same format is probably a good starting step down
> > that path.
> >
> > Cheers,
> > -- David
> >