Re: [PATCH v2 1/3] kunit: Always run cleanup from a test kthread

Benjamin Berg <benjamin@xxxxxxxxxxxxxxxx> · Thu, 20 Apr 2023 09:34:54 +0200

Hi David,

On Wed, 2023-04-19 at 16:54 +0800, David Gow wrote:
> KUnit tests run in a kthread, with the current->kunit_test pointer set
> to the test's context. This allows the kunit_get_current_test() and
> kunit_fail_current_test() macros to work. Normally, this pointer is
> still valid during test shutdown (i.e., the suite->exit function, and
> any resource cleanup). However, if the test has exited early (e.g., due
> to a failed assertion), the cleanup is done in the parent KUnit thread,
> which does not have an active context.
> 
> Instead, in the event test terminates early, run the test exit and
> cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
> and better isolates the rest of KUnit from issues which arise in test
> cleanup.
> 
> If a test cleanup function itself aborts (e.g., due to an assertion
> failing), there will be no further attempts to clean up: an error will
> be logged and the test failed.
> 
> This should also make it easier to get access to the KUnit context,
> particularly from within resource cleanup functions, which may, for
> example, need access to data in test->priv.
> 
> Signed-off-by: David Gow <davidgow@xxxxxxxxxx>
> ---
> This is an updated version of / replacement of "kunit: Set the current
> KUnit context when cleaning up", which instead creates a new kthread
> for cleanup tasks if the original test kthread is aborted. This protects
> us from failed assertions during cleanup, if the test exited early.
> 
> Changes since v1:
> https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@xxxxxxxxxx/
> - Move cleanup execution to another kthread
>   - (Thanks, Benjamin, for pointing out the assertion issues)

Nice, I think this is looking promising. After thinking about it a bit,
maybe one thing to improve is to always start the new cleanup kthread
from kunit_run_case_catch_errors.

That way there is only one codepath. But, more importantly, it means if
the cleanup fails the first time, we do not risk running it a second
time and we get slightly nicer error reporting. Not that this happening
would be a big issue.

Benjamin

> ---
>  lib/kunit/test.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> index e2910b261112..caeae0dfd82b 100644
> --- a/lib/kunit/test.c
> +++ b/lib/kunit/test.c
> @@ -423,8 +423,51 @@ static void kunit_try_run_case(void *data)
>         kunit_run_case_cleanup(test, suite);
>  }
>  
> +static void kunit_try_run_case_cleanup(void *data)
> +{
> +       struct kunit_try_catch_context *ctx = data;
> +       struct kunit *test = ctx->test;
> +       struct kunit_suite *suite = ctx->suite;
> +
> +       current->kunit_test = test;
> +
> +       kunit_run_case_cleanup(test, suite);
> +}
> +
> +static void kunit_catch_run_case_cleanup(void *data)
> +{
> +       struct kunit_try_catch_context *ctx = data;
> +       struct kunit *test = ctx->test;
> +       int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
> +
> +       /* It is always a failure if cleanup aborts. */
> +       kunit_set_failure(test);
> +
> +       if (try_exit_code) {
> +               /*
> +                * Test case could not finish, we have no idea what state it is
> +                * in, so don't do clean up.
> +                */
> +               if (try_exit_code == -ETIMEDOUT) {
> +                       kunit_err(test, "test case cleanup timed out\n");
> +               /*
> +                * Unknown internal error occurred preventing test case from
> +                * running, so there is nothing to clean up.
> +                */
> +               } else {
> +                       kunit_err(test, "internal error occurred during test case cleanup: %d\n",
> +                                 try_exit_code);
> +               }
> +               return;
> +       }
> +
> +       kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
> +}
> +
> +
>  static void kunit_catch_run_case(void *data)
>  {
> +       struct kunit_try_catch cleanup;
>         struct kunit_try_catch_context *ctx = data;
>         struct kunit *test = ctx->test;
>         struct kunit_suite *suite = ctx->suite;
> @@ -451,9 +494,16 @@ static void kunit_catch_run_case(void *data)
>  
>         /*
>          * Test case was run, but aborted. It is the test case's business as to
> -        * whether it failed or not, we just need to clean up.
> +        * whether it failed or not, we just need to clean up. Do this in a new
> +        * try / catch context, in case it asserts, too.
>          */
> -       kunit_run_case_cleanup(test, suite);
> +       kunit_try_catch_init(&cleanup,
> +                            test,
> +                            kunit_try_run_case_cleanup,
> +                            kunit_catch_run_case_cleanup);
> +       ctx->test = test;
> +       ctx->suite = suite;
> +       kunit_try_catch_run(&cleanup, ctx);
>  }
>  
>  /*