Re: [RFC v4 08/17] kunit: test: add support for test abort

Knut Omang <knut.omang@xxxxxxxxxx> · Fri, 22 Mar 2019 08:10:18 +0100

On Thu, 2019-03-21 at 18:41 -0700, Brendan Higgins wrote:
> On Thu, Mar 21, 2019 at 6:10 PM Frank Rowand <frowand.list@xxxxxxxxx> wrote:
> > On 2/27/19 11:42 PM, Brendan Higgins wrote:
> > > On Tue, Feb 19, 2019 at 10:44 PM Frank Rowand <frowand.list@xxxxxxxxx>
> > > wrote:
> > > > On 2/19/19 7:39 PM, Brendan Higgins wrote:
> > > > > On Mon, Feb 18, 2019 at 11:52 AM Frank Rowand <frowand.list@xxxxxxxxx>
> > > > > wrote:
> > > > > > On 2/14/19 1:37 PM, Brendan Higgins wrote:
> > > > > > > Add support for aborting/bailing out of test cases. Needed for
> > > > > > > implementing assertions.
> > > > > > > 
> > > > > > > Signed-off-by: Brendan Higgins <brendanhiggins@xxxxxxxxxx>
> > > > > > > ---
> > > > > > > Changes Since Last Version
> > > > > > >  - This patch is new introducing a new cross-architecture way to
> > > > > > > abort
> > > > > > >    out of a test case (needed for KUNIT_ASSERT_*, see next patch
> > > > > > > for
> > > > > > >    details).
> > > > > > >  - On a side note, this is not a complete replacement for the UML
> > > > > > > abort
> > > > > > >    mechanism, but covers the majority of necessary functionality.
> > > > > > > UML
> > > > > > >    architecture specific featurs have been dropped from the
> > > > > > > initial
> > > > > > >    patchset.
> > > > > > > ---
> > > > > > >  include/kunit/test.h |  24 +++++
> > > > > > >  kunit/Makefile       |   3 +-
> > > > > > >  kunit/test-test.c    | 127 ++++++++++++++++++++++++++
> > > > > > >  kunit/test.c         | 208
> > > > > > > +++++++++++++++++++++++++++++++++++++++++--
> > > > > > >  4 files changed, 353 insertions(+), 9 deletions(-)
> > > > > > >  create mode 100644 kunit/test-test.c
> > > > > > 
> > > > > > < snip >
> > > > > > 
> > > > > > > diff --git a/kunit/test.c b/kunit/test.c
> > > > > > > index d18c50d5ed671..6e5244642ab07 100644
> > > > > > > --- a/kunit/test.c
> > > > > > > +++ b/kunit/test.c
> > > > > > > @@ -6,9 +6,9 @@
> > > > > > >   * Author: Brendan Higgins <brendanhiggins@xxxxxxxxxx>
> > > > > > >   */
> > > > > > > 
> > > > > > > -#include <linux/sched.h>
> > > > > > >  #include <linux/sched/debug.h>
> > > > > > > -#include <os.h>
> > > > > > > +#include <linux/completion.h>
> > > > > > > +#include <linux/kthread.h>
> > > > > > >  #include <kunit/test.h>
> > > > > > > 
> > > > > > >  static bool kunit_get_success(struct kunit *test)
> > > > > > > @@ -32,6 +32,27 @@ static void kunit_set_success(struct kunit
> > > > > > > *test, bool success)
> > > > > > >       spin_unlock_irqrestore(&test->lock, flags);
> > > > > > >  }
> > > > > > > 
> > > > > > > +static bool kunit_get_death_test(struct kunit *test)
> > > > > > > +{
> > > > > > > +     unsigned long flags;
> > > > > > > +     bool death_test;
> > > > > > > +
> > > > > > > +     spin_lock_irqsave(&test->lock, flags);
> > > > > > > +     death_test = test->death_test;
> > > > > > > +     spin_unlock_irqrestore(&test->lock, flags);
> > > > > > > +
> > > > > > > +     return death_test;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void kunit_set_death_test(struct kunit *test, bool
> > > > > > > death_test)
> > > > > > > +{
> > > > > > > +     unsigned long flags;
> > > > > > > +
> > > > > > > +     spin_lock_irqsave(&test->lock, flags);
> > > > > > > +     test->death_test = death_test;
> > > > > > > +     spin_unlock_irqrestore(&test->lock, flags);
> > > > > > > +}
> > > > > > > +
> > > > > > >  static int kunit_vprintk_emit(const struct kunit *test,
> > > > > > >                             int level,
> > > > > > >                             const char *fmt,
> > > > > > > @@ -70,13 +91,29 @@ static void kunit_fail(struct kunit *test,
> > > > > > > struct kunit_stream *stream)
> > > > > > >       stream->commit(stream);
> > > > > > >  }
> > > > > > > 
> > > > > > > +static void __noreturn kunit_abort(struct kunit *test)
> > > > > > > +{
> > > > > > > +     kunit_set_death_test(test, true);
> > > > > > > +
> > > > > > > +     test->try_catch.throw(&test->try_catch);
> > > > > > > +
> > > > > > > +     /*
> > > > > > > +      * Throw could not abort from test.
> > > > > > > +      */
> > > > > > > +     kunit_err(test, "Throw could not abort from test!");
> > > > > > > +     show_stack(NULL, NULL);
> > > > > > > +     BUG();
> > > > > > 
> > > > > > kunit_abort() is what will be call as the result of an assert
> > > > > > failure.
> > > > > 
> > > > > Yep. Does that need clarified somewhere.
> > > > > > BUG(), which is a panic, which is crashing the system is not
> > > > > > acceptable
> > > > > > in the Linux kernel.  You will just annoy Linus if you submit this.
> > > > > 
> > > > > Sorry, I thought this was an acceptable use case since, a) this should
> > > > > never be compiled in a production kernel, b) we are in a pretty bad,
> > > > > unpredictable state if we get here and keep going. I think you might
> > > > > have said elsewhere that you think "a" is not valid? In any case, I
> > > > > can replace this with a WARN, would that be acceptable?
> > > > 
> > > > A WARN may or may not make sense, depending on the context.  It may
> > > > be sufficient to simply report a test failure (as in the old version
> > > > of case (2) below.
> > > > 
> > > > Answers to "a)" and "b)":
> > > > 
> > > > a) it might be in a production kernel
> > > 
> > > Sorry for a possibly stupid question, how might it be so? Why would
> > > someone intentionally build unit tests into a production kernel?
> > 
> > People do things.  Just expect it.
> 
> Huh, alright. I will take your word for it then.

I have a better explanation: Production kernels have bugs, unfortunately.
And sometimes those need to be investigated on systems than cannot be 
brought down or affected more than absolutely necessary, maybe via a third party
doing the execution. A light weight, precise test (well tested ahead :) ) might
be a way of proving or disproving assumptions that can lead to the development
and application of a fix. 

IMHO you're confusing "building into" with temporary applying, then removing
again - like the difference between running a local user space program vs
installing it under /usr and have it in everyone's PATH.

> > > > a') it is not acceptable in my development kernel either

I think one of the fundamental properties of a good test framework is that it
should not require changes to the code under test by itself.

Knut

> > > Fair enough.
> > > 
> > > > b) No.  You don't crash a developer's kernel either unless it is
> > > > required to avoid data corruption.
> > > Alright, I thought that was one of those cases, but I am not going to
> > > push the point. Also, in case it wasn't clear, the path where BUG()
> > > gets called only happens if there is a bug in KUnit itself, not just
> > > because a test case fails catastrophically.
> > 
> > Still not out of the woods.  Still facing Lions and Tigers and Bears,
> > Oh my!
> 
> Nope, I guess not :-)
> 
> > So kunit_abort() is normally called as the result of an assert
> > failure (as written many lines further above).
> > 
> > kunit_abort()
> >    test->try_catch.throw(&test->try_catch)
> >    // this is really kunit_generic_throw(), yes?
> >       complete_and_exit()
> >          if (comp)
> >             // comp is test_case_completion?
> >             complete(comp)
> >          do_exit()
> >             // void __noreturn do_exit(long code)
> >             // depending on the task, either panic
> >             // or the task dies
> 
> You are right up until after it calls do_exit().
> 
> KUnit actually spawns a thread for the test case to run in so that
> when exit is called, only the test case thread dies. The thread that
> started KUnit is never affected.
> 
> > I did not read through enough of the code to understand what is going
> > on here.  Is each kunit_module executed in a newly created thread?
> > And if kunit_abort() is called then that thread dies?  Or something
> > else?
> 
> Mostly right, each kunit_case (not kunit_module) gets executed in its
> own newly created thread. If kunit_abort() is called in that thread,
> the kunit_case thread dies. The parent thread keeps going, and other
> test cases are executed.
> 
> > 
> > > > b') And you can not do replacements like:
> > > > 
> > > > (1) in of_unittest_check_tree_linkage()
> > > > 
> > > > -----  old  -----
> > > > 
> > > >         if (!of_root)
> > > >                 return;
> > > > 
> > > > -----  new  -----
> > > > 
> > > >         KUNIT_ASSERT_NOT_ERR_OR_NULL(test, of_root);
> > > > 
> > > > (2) in of_unittest_property_string()
> > > > 
> > > > -----  old  -----
> > > > 
> > > >         /* of_property_read_string_index() tests */
> > > >         rc = of_property_read_string_index(np, "string-property", 0,
> > > > strings);
> > > >         unittest(rc == 0 && !strcmp(strings[0], "foobar"),
> > > > "of_property_read_string_index() failure; rc=%i\n", rc);
> > > > 
> > > > -----  new  -----
> > > > 
> > > >         /* of_property_read_string_index() tests */
> > > >         rc = of_property_read_string_index(np, "string-property", 0,
> > > > strings);
> > > >         KUNIT_ASSERT_EQ(test, rc, 0);
> > > >         KUNIT_EXPECT_STREQ(test, strings[0], "foobar");
> > > > 
> > > > 
> > > > If a test fails, that is no reason to abort testing.  The remainder of
> > > > the unit
> > > > tests can still run.  There may be cascading failures, but that is ok.
> > > 
> > > Sure, that's what I am trying to do. I don't see how (1) changes
> > > anything, a failed KUNIT_ASSERT_* only bails on the current test case,
> > > it does not quit the entire test suite let alone crash the kernel.
> > 
> > This may be another case of whether a kunit_module is approximately a
> > single KUNIT_EXPECT_*() or a larger number of them.
> > 
> > I still want, for example, of_unittest_property_string() to include a large
> > number of KUNIT_EXPECT_*() instances.  In that case I still want the rest of
> > the tests in the kunit_module to be executed even after a KUNIT_ASSERT_*()
> > fails.  The existing test code has that property.
> 
> Sure, in the context of the reply you just sent me on the DT unittest
> thread, that makes sense. I can pull out all but the ones that would
> have terminated the collection of test cases (where you return early),
> if that makes it better.