Re: [PATCH v3 5/5] selftests/resctrl: Add non-contiguous CBMs CAT test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Maciej,

On 2/2/2024 2:17 AM, Maciej Wieczor-Retman wrote:
> On 2024-02-01 at 11:47:44 -0800, Reinette Chatre wrote:
>> Hi Maciej,
>>
>> On 1/31/2024 4:55 AM, Maciej Wieczor-Retman wrote:
>>> On 2024-01-26 at 13:10:18 -0800, Reinette Chatre wrote:
>>>> On 1/25/2024 3:13 AM, Maciej Wieczor-Retman wrote:
>>>>> +	if (sparse_masks != ((ecx >> 3) & 1)) {
>>>>> +		ksft_print_msg("CPUID output doesn't match 'sparse_masks' file content!\n");
>>>>> +		return -1;
>>>>
>>>> If I understand correctly this falls into the "test failure" [1] category
>>>> and should return 1? ...
>>>>
>>>>> +	}
>>>>> +
>>>>> +	/* Write checks initialization. */
>>>>> +	ret = get_full_cbm(test->resource, &full_cache_mask);
>>>>> +	if (ret < 0)
>>>>> +		return ret;
>>>>> +	bit_center = count_bits(full_cache_mask) / 2;
>>>>> +	cont_mask = full_cache_mask >> bit_center;
>>>>> +
>>>>> +	/* Contiguous mask write check. */
>>>>> +	snprintf(schemata, sizeof(schemata), "%lx", cont_mask);
>>>>> +	ret = write_schemata("", schemata, uparams->cpu, test->resource);
>>>>> +	if (ret) {
>>>>> +		ksft_print_msg("Write of contiguous CBM failed\n");
>>>>> +		return ret;
>>>>
>>>> ... although here I think the goal to distinguish between test error and test failure
>>>> falls apart since it is not possible to tell within the test if the failure is
>>>> because of error in the test or if test failed.
>>>
>>> Is there even a distinction between test error and failure in resctrl selftest?
>>
>> There is such a distinction in the current tests (and from what I understand the reason
>> behind the logical XOR used in this test) . In existing tests the running of
>> the test precedes and is clearly separate from determining of the test pass/fail.
>> All the current tests have a clear "run the test" phase where data is collected to
>> a file, followed by an analysis (aka "check results") phase that looks at collected
>> data to determine if the test passes or fails.
>> Note how all the "check results" return either 0 or 1 to indicate test pass
>> or fail respectively. Specifically, you can refer to:
>> mbm_test.c->check_results()
>> mba_test.c->check_results()
>> cmt_test.c->check_results()
>> cat_test.c->check_results()
>>
>>> I've been looking at it for a while and can't find any instances where
>>> ksft_test_result_error() would be used. Everywhere I look it's either pass or
>>> fail. By grep-ing over all selftests I found only five tests that use
>>> ksft_test_result_error().
>>
>> Yes, from the user perspective there is no such distinction. This seems to
>> be entirely internal to the resctrl selftests (but I do not think that this
>> should or can be a hard requirement).
> 
> Okay, thank you, that's what I wanted to know.
> 
>>
>>>
>>> Furthermore there is this one "TODO" in kselftests.h:
>>>
>>> 	/* TODO: how does "error" differ from "fail" or "skip"? */
>>>
>>> If you meant the distintion less literally then I'd say the sparse_masks
>>> comparison to CPUID would be a failure. What I had in mind is that it tries to
>>> validate a resctrl interface relevant to non-contiguous CBMs. If it fails
>>> there is probably something wrong with the code concerning non-contiguous CBMs.
>>
>> Wrong with which code? As I understand this particular check compares the
>> resctrl view of the world to the hardware realities. If this check fails
>> then I do not think this is an issue with the test code (which would make it a test
>> error) but instead a resctrl bug and thus a test failure.
> 
> I also meant a resctrl bug. I was thinking about the kernel resctrl code that
> handles taking the CPUID information about non-contiguous CBMs and putting it in
> the sparse_masks file.
> 
> If there was a hardware problem and CPUID returned wrong information, then the
> check wouldn't fail as sparse_masks relies on CPUID too and both values would
> match. So in view of this I thought that this check could make sure that the
> resctrl kernel code handles CPUID returned information properly.
> 
> So should this check be moved from the "run the test" phase to the end of the
> function ("check results" phase) to signify that it's not an error but a
> failure?

I do not think this test matches the "run" and "check" phases of previous tests,
unless you create a new test for every scenario checked within this test.

Just returning 1 when the check (if (sparse_masks != ((ecx >> 3) & 1))) fails
should be ok, no?

>>> On the other hand writing contiguous CBMs shouldn't fail as far as the
>>> non-contiguous CBMs in CAT test is concerned. So if that fails there might be
>>> something wrong on a higher level and I'd say that can be more of an error than
>>> a failure.
>>
>> I think that the write_schemata() can fail for a variety of reasons, some may
>> indicate an issue with the test while some may indicate an issue with resctrl.
>> It is not possible for the caller of write_schemata() to distinguish.
>>
>>> But I'm just saying how I undestood it so far. If there is some clear
>>> distinction between error and failure definitions I could try to separate it
>>> more explicitly.
>>
>> I do not think it is possible to clearly distinguish between error and failure.
>> These are already lumped together as a ksft_test_result_fail() anyway so no
>> risk of confusion to folks just running the tests.
>> I think the final test result may be confusing to folks parsing the
>> resctrl selftest internals:
>>
>> 	run_single_test()
>> 	{
>> 		...
>> 		ret = test->run_test(test, uparams);
>> 		ksft_test_result(!ret, "%s: test\n", test->name);
>> 		...
>> 	}
>>
>> above means that a test returning negative or greater than zero value is
>> considered a test failure and resctrl tests may return either in the case of
>> an actual test failure ... but from user perspective there is no difference
>> so I do not think it is an issue, just lack of consistency in the resctrl
>> test internals in cases like write_schemata() failure where a possible
>> test fail is captured as a test error. 
>>
>> I do not think it is required to be strict here. Keeping "test returns
>> negative or greater than zero on test failure" seems reasonable to me.
> 
> Okay, so the approach I applied in noncont_cat_run_test() with write_schemata()
> is acceptable?

In general I'd say a write_schemata() failure's return code will be acceptable,
but you should be consistent in this test. There are two write_schemata()
calls in this test, one treats an error return as a failure and the other treats
an error return as an error. Considering this inconsistency I would thus rather
suggest that you always treat write_schemata() error return as a test failure.

Reinette





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux