Hi Maciej, On 1/31/2024 4:55 AM, Maciej Wieczor-Retman wrote: > On 2024-01-26 at 13:10:18 -0800, Reinette Chatre wrote: >> On 1/25/2024 3:13 AM, Maciej Wieczor-Retman wrote: >>> Add tests for both L2 and L3 CAT to verify the return values >>> generated by writing non-contiguous CBMs don't contradict the >>> reported non-contiguous support information. >>> >>> Use a logical XOR to confirm return value of write_schemata() and >>> non-contiguous CBMs support information match. >>> >>> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@xxxxxxxxx> >>> --- >>> Changelog v3: >>> - Roll back __cpuid_count part. (Reinette) >>> - Update function name to read sparse_masks file. >>> - Roll back get_cache_level() changes. >>> - Add ksft_print_msg() to contiguous schemata write error handling >>> (Reinette). >>> >>> Changelog v2: >>> - Redo the patch message. (Ilpo) >>> - Tidy up __cpuid_count calls. (Ilpo) >>> - Remove redundant AND in noncont_mask calculations (Ilpo) >>> - Fix bit_center offset. >>> - Add newline before function return. (Ilpo) >>> - Group non-contiguous tests with CAT tests. (Ilpo) >>> - Use a helper for reading sparse_masks file. (Ilpo) >>> - Make get_cache_level() available in other source files. (Ilpo) >>> >>> tools/testing/selftests/resctrl/cat_test.c | 81 +++++++++++++++++++ >>> tools/testing/selftests/resctrl/resctrl.h | 2 + >>> .../testing/selftests/resctrl/resctrl_tests.c | 2 + >>> 3 files changed, 85 insertions(+) >>> >>> diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c >>> index 39fc9303b8e8..9086bf359072 100644 >>> --- a/tools/testing/selftests/resctrl/cat_test.c >>> +++ b/tools/testing/selftests/resctrl/cat_test.c >>> @@ -294,6 +294,71 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param >>> return ret; >>> } >>> >>> +static int noncont_cat_run_test(const struct resctrl_test *test, >>> + const struct user_params *uparams) >>> +{ >>> + unsigned long full_cache_mask, cont_mask, noncont_mask; >>> + unsigned int eax, ebx, ecx, edx, ret, sparse_masks; >>> + char schemata[64]; >>> + int bit_center; >>> + >>> + /* Check to compare sparse_masks content to CPUID output. */ >>> + ret = resource_info_unsigned_get(test->resource, "sparse_masks", &sparse_masks); >>> + if (ret) >>> + return ret; >>> + >>> + if (!strcmp(test->resource, "L3")) >>> + __cpuid_count(0x10, 1, eax, ebx, ecx, edx); >>> + else if (!strcmp(test->resource, "L2")) >>> + __cpuid_count(0x10, 2, eax, ebx, ecx, edx); >>> + else >>> + return -EINVAL; >>> + >>> + if (sparse_masks != ((ecx >> 3) & 1)) { >>> + ksft_print_msg("CPUID output doesn't match 'sparse_masks' file content!\n"); >>> + return -1; >> >> If I understand correctly this falls into the "test failure" [1] category >> and should return 1? ... >> >>> + } >>> + >>> + /* Write checks initialization. */ >>> + ret = get_full_cbm(test->resource, &full_cache_mask); >>> + if (ret < 0) >>> + return ret; >>> + bit_center = count_bits(full_cache_mask) / 2; >>> + cont_mask = full_cache_mask >> bit_center; >>> + >>> + /* Contiguous mask write check. */ >>> + snprintf(schemata, sizeof(schemata), "%lx", cont_mask); >>> + ret = write_schemata("", schemata, uparams->cpu, test->resource); >>> + if (ret) { >>> + ksft_print_msg("Write of contiguous CBM failed\n"); >>> + return ret; >> >> ... although here I think the goal to distinguish between test error and test failure >> falls apart since it is not possible to tell within the test if the failure is >> because of error in the test or if test failed. > > Is there even a distinction between test error and failure in resctrl selftest? There is such a distinction in the current tests (and from what I understand the reason behind the logical XOR used in this test) . In existing tests the running of the test precedes and is clearly separate from determining of the test pass/fail. All the current tests have a clear "run the test" phase where data is collected to a file, followed by an analysis (aka "check results") phase that looks at collected data to determine if the test passes or fails. Note how all the "check results" return either 0 or 1 to indicate test pass or fail respectively. Specifically, you can refer to: mbm_test.c->check_results() mba_test.c->check_results() cmt_test.c->check_results() cat_test.c->check_results() > I've been looking at it for a while and can't find any instances where > ksft_test_result_error() would be used. Everywhere I look it's either pass or > fail. By grep-ing over all selftests I found only five tests that use > ksft_test_result_error(). Yes, from the user perspective there is no such distinction. This seems to be entirely internal to the resctrl selftests (but I do not think that this should or can be a hard requirement). > > Furthermore there is this one "TODO" in kselftests.h: > > /* TODO: how does "error" differ from "fail" or "skip"? */ > > If you meant the distintion less literally then I'd say the sparse_masks > comparison to CPUID would be a failure. What I had in mind is that it tries to > validate a resctrl interface relevant to non-contiguous CBMs. If it fails > there is probably something wrong with the code concerning non-contiguous CBMs. Wrong with which code? As I understand this particular check compares the resctrl view of the world to the hardware realities. If this check fails then I do not think this is an issue with the test code (which would make it a test error) but instead a resctrl bug and thus a test failure. > On the other hand writing contiguous CBMs shouldn't fail as far as the > non-contiguous CBMs in CAT test is concerned. So if that fails there might be > something wrong on a higher level and I'd say that can be more of an error than > a failure. I think that the write_schemata() can fail for a variety of reasons, some may indicate an issue with the test while some may indicate an issue with resctrl. It is not possible for the caller of write_schemata() to distinguish. > But I'm just saying how I undestood it so far. If there is some clear > distinction between error and failure definitions I could try to separate it > more explicitly. I do not think it is possible to clearly distinguish between error and failure. These are already lumped together as a ksft_test_result_fail() anyway so no risk of confusion to folks just running the tests. I think the final test result may be confusing to folks parsing the resctrl selftest internals: run_single_test() { ... ret = test->run_test(test, uparams); ksft_test_result(!ret, "%s: test\n", test->name); ... } above means that a test returning negative or greater than zero value is considered a test failure and resctrl tests may return either in the case of an actual test failure ... but from user perspective there is no difference so I do not think it is an issue, just lack of consistency in the resctrl test internals in cases like write_schemata() failure where a possible test fail is captured as a test error. I do not think it is required to be strict here. Keeping "test returns negative or greater than zero on test failure" seems reasonable to me. Reinette