Re: [PATCH v2 1/6] selftests/resctrl: Extend signal handler coverage to unmount on receiving signal

Reinette Chatre <reinette.chatre@xxxxxxxxx> · Thu, 28 Sep 2023 10:09:58 -0700

Hi Ilpo,

On 9/28/2023 5:47 AM, Ilpo Järvinen wrote:
> On Tue, 26 Sep 2023, Reinette Chatre wrote:
>> On 9/15/2023 8:44 AM, Ilpo Järvinen wrote:

...

>>> +
>>> +static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
>>> +{
>>> +	int res;
>>> +
>>> +	ksft_print_msg("Starting MBM BW change ...\n");
>>> +
>>> +	if (test_prepare())
>>> +		return;
>>>  
>>
>> I am not sure about this. With this exit the kselftest machinery is not
>> aware of the test passing or failing. I wonder if there should not rather
>> be a "goto" here that triggers ksft_test_result()?
> 
> Yes, ksft_test_result() is needed here (I forgot to add it).
> 
>> This needs some more 
>> thought though. First, with this change test_prepare() officially gains
>> responsibility to determine if a failure is transient (just a single test
>> fails) or permanent (no use trying any other tests if this fails). For
>> the former it would then be up to the caller to call ksft_test_result()
>> and for the latter test_prepare() will call ksft_exit_fail_msg().
> 
> Well, I didn't initially have test_prepare() at all but all this was 
> within the test functions (which will be consolidated to a single function 
> by the series that comes after the two series are done + one patch from 
> Maciej).
> 
> I was just trying to do what was done previously but it seems I forgot to 
> handle the result status on signal reg fail path.
> 
> TBH, I wouldn't mind if also the signal reg fail is just up'ed to 
> ksft_exit_fail_msg(). I don't think it can ever fail with the parameters 
> given to it so its error handling feels pretty much dead-code (unless some 
> crazy thing such as apparmor does something out of the blue, I don't know 
> if apparmor has capability override sigaction() but I've seen apparmor to
> create errors that from the surface make no sense whatsoever comparable
> to this case).
> 
> So basically this discussion is now about what to do with the mount 
> failing which already does _exit() before this patch (and possibly some
> hypotethical, new prepare code after the consolidation work which also
> will have some impact and I believe we might actually want to kill 
> test_prepare() at that point anyway).

Having failure during signal handler registration also trigger ksft_exit_fail_msg()
sounds fair to me. I am also ok with keeping the exit when mount fails.

If any future test_prepare() code does not imply a test exit then I hope it would
be obvious that ksft_test_result() needs to be called. Perhaps that can
be accomplished if test_prepare() does not exit the test but instead just
returns an error code (if needed it can use ksft_print_msg() internally for
any details about particular failures) and the caller call ksft_exit_fail_msg()
if test_prepare() fails? With the caller responsible for the ksft_exit_fail_msg()
as well as ksft_test_result() then any new addition may be guided to the
right calls. This considers hypothetical future changes to code that is
being consolidated so surely no strong opinions from my side.

>> Second, that SNC warning may be an inconvenience with a new goto. Here
>> it may be ok to print that message before the test failure?
> 
> I don't follow what you're referring to with "that SNC warning". To the 
> "Intel CMT may be inaccurate ..." one?

Yes, that is the warning. I envisioned addressing the issue by adding a
goto label right before the ksft_test_result() call within run_cmt_test()
in this case (but also in run_mbm_test()). Doing so would solve the issue
that test counters are incremented on test_prepare() failure but it will
also trigger the message you note and that would be confusing to the user if
the test failure was because of signal handler registration failure.

...

>>> diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
>>> index 51963a6f2186..a9fe61133119 100644
>>> --- a/tools/testing/selftests/resctrl/resctrl_val.c
>>> +++ b/tools/testing/selftests/resctrl/resctrl_val.c
>>> @@ -468,7 +468,9 @@ pid_t bm_pid, ppid;
>>>  
>>>  void ctrlc_handler(int signum, siginfo_t *info, void *ptr)
>>>  {
>>> -	kill(bm_pid, SIGKILL);
>>> +	/* Only kill child after bm_pid is set after fork() */
>>> +	if (bm_pid)
>>> +		kill(bm_pid, SIGKILL);
>>>  	umount_resctrlfs();
>>>  	tests_cleanup();
>>>  	ksft_print_msg("Ending\n\n");
>>> @@ -485,6 +487,8 @@ int signal_handler_register(void)
>>>  	struct sigaction sigact;
>>>  	int ret = 0;
>>>  
>>> +	bm_pid = 0;
>>> +
>>
>> Since this is an initialization fix in this area ... what
>> do you think of also initializing sigact? It could just be
>> a change to
>> 	struct sigaction sigact = {};
>>
>> This will prevent registering a signal handler with 
>> uninitialized sa_flags.
> 
> Nice catch. It seems quite bad bug, I'll add another patch to fix it.
> 
> Thanks once again for your reviews! I'll also address the changelog 
> improvements you mentioned against the other patches.
> 

Thanks to you for improving the resctrl selftests so significantly.
This work is very valuable because we use it to measure and gain
confidence in the health of the resctrl subsystem.

Reinette