RE: [PATCH v2] selftests/resctrl: Print a message if the result of MBM&CMT tests is failed when Intel Sub-NUMA is enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Reinette,

> On 1/21/2022 12:00 AM, tan.shaopeng@xxxxxxxxxxx wrote:
> > Hi Reinette,
> >
> >> On 12/13/2021 2:03 AM, Shaopeng Tan wrote:
> >>> If the result of MBM&CMT tests is failed when Intel Sub-NUMA is
> >>> enabled, print a possible causes of failure.
> >>> Since when the Intel Sub-NUMA Clustering(SNC) feature is enabled,
> >>> the CMT and MBM counters may not be accurate.
> >>>
> >>> Signed-off-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
> >>> ---
> >>> Hello,
> >>>
> >>> According to the Intel RDT reference Manual, when the sub-numa
> >>> clustering feature is enabled, the CMT and MBM counters may not be
> >>> accurate.
> >>> When running CMT tests and MBM tests on 2nd Generation Intel Xeon
> >>> Scalable Processor, the result may be "not ok".
> >>> If result of MBM&CMT tests is failed when Intel Sub-NUMA is enabled,
> >>> fix it to print a possible cause of failure, instead of SKIP these
> >>> tests in v1.
> >>>
> >>> Thanks,
> >>>
> >>>  tools/testing/selftests/resctrl/Makefile      |  1 +
> >>>  tools/testing/selftests/resctrl/cmt_test.c    |  5 ++-
> >>>  tools/testing/selftests/resctrl/mbm_test.c    |  5 ++-
> >>>  tools/testing/selftests/resctrl/resctrl.h     |  2 ++
> >>>  .../testing/selftests/resctrl/resctrl_tests.c | 36
> >> +++++++++++++++++++
> >>>  tools/testing/selftests/resctrl/resctrlfs.c   | 26 ++++++++++++++
> >>>  6 files changed, 73 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/tools/testing/selftests/resctrl/Makefile
> >>> b/tools/testing/selftests/resctrl/Makefile
> >>> index adfd92145e12..6d257f40e6ac 100644
> >>> --- a/tools/testing/selftests/resctrl/Makefile
> >>> +++ b/tools/testing/selftests/resctrl/Makefile
> >>> @@ -1,6 +1,7 @@
> >>>  #SPDX-License-Identifier: GPL-2.0
> >>>
> >>>  CFLAGS += -g -Wall -O2 -D_FORTIFY_SOURCE=2
> >>> +LDLIBS += -lnuma
> >>>
> >>>  TEST_GEN_PROGS := resctrl_tests
> >>>  EXTRA_SOURCES := $(wildcard *.c)
> >>> diff --git a/tools/testing/selftests/resctrl/cmt_test.c
> >>> b/tools/testing/selftests/resctrl/cmt_test.c
> >>> index 8968e36db99d..c5a49444c5a0 100644
> >>> --- a/tools/testing/selftests/resctrl/cmt_test.c
> >>> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> >>> @@ -136,8 +136,11 @@ int cmt_resctrl_val(int cpu_no, int n, char
> >> **benchmark_cmd)
> >>>  		return ret;
> >>>
> >>>  	ret = check_results(&param, n);
> >>> -	if (ret)
> >>> +	if (ret) {
> >>> +		if (sub_numa_cluster_enable)
> >>> +			ksft_print_msg("Sub-NUMA Clustering(SNC) feature
> >> is enabled, the
> >>> +CMT counters may not be accurate.\n");
> >>>  		return ret;
> >>> +	}
> >>>
> >>>  	cmt_test_cleanup();
> >>>
> >>> diff --git a/tools/testing/selftests/resctrl/mbm_test.c
> >>> b/tools/testing/selftests/resctrl/mbm_test.c
> >>> index 8392e5c55ed0..7dc1bdf2d0b8 100644
> >>> --- a/tools/testing/selftests/resctrl/mbm_test.c
> >>> +++ b/tools/testing/selftests/resctrl/mbm_test.c
> >>> @@ -136,8 +136,11 @@ int mbm_bw_change(int span, int cpu_no, char
> >> *bw_report, char **benchmark_cmd)
> >>>  		return ret;
> >>>
> >>>  	ret = check_results(span);
> >>> -	if (ret)
> >>> +	if (ret) {
> >>> +		if (sub_numa_cluster_enable)
> >>> +			ksft_print_msg("Sub-NUMA Clustering(SNC) feature
> >> is enabled, the
> >>> +MBM counters may not be accurate.\n");
> >>>  		return ret;
> >>> +	}
> >>>
> >>>  	mbm_test_cleanup();
> >>>
> >>> diff --git a/tools/testing/selftests/resctrl/resctrl.h
> >>> b/tools/testing/selftests/resctrl/resctrl.h
> >>> index 1ad10c47e31d..4b8ad4fbd016 100644
> >>> --- a/tools/testing/selftests/resctrl/resctrl.h
> >>> +++ b/tools/testing/selftests/resctrl/resctrl.h
> >>> @@ -76,6 +76,7 @@ extern pid_t bm_pid, ppid;
> >>>
> >>>  extern char llc_occup_path[1024];
> >>>  extern bool is_amd;
> >>> +extern bool sub_numa_cluster_enable;
> >>>
> >>>  bool check_resctrlfs_support(void);  int filter_dmesg(void); @@
> >>> -85,6 +86,7 @@ int umount_resctrlfs(void);  int
> >>> validate_bw_report_request(char *bw_report);  bool
> >>> validate_resctrl_feature_request(const char *resctrl_val);  char
> >>> *fgrep(FILE *inf, const char *str);
> >>> +char *fgrep_last_match_line(FILE *inf, const char *str);
> >>>  int taskset_benchmark(pid_t bm_pid, int cpu_no);  void
> >>> run_benchmark(int signum, siginfo_t *info, void *ucontext);  int
> >>> write_schemata(char *ctrlgrp, char *schemata, int cpu_no, diff --git
> >>> a/tools/testing/selftests/resctrl/resctrl_tests.c
> >>> b/tools/testing/selftests/resctrl/resctrl_tests.c
> >>> index 3be0895c492b..bbab4a7f37ed 100644
> >>> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> >>> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> >>> @@ -8,12 +8,15 @@
> >>>   *    Sai Praneeth Prakhya <sai.praneeth.prakhya@xxxxxxxxx>,
> >>>   *    Fenghua Yu <fenghua.yu@xxxxxxxxx>
> >>>   */
> >>> +#include <numa.h>
> >>> +#include <string.h>
> >>>  #include "resctrl.h"
> >>>
> >>>  #define BENCHMARK_ARGS		64
> >>>  #define BENCHMARK_ARG_SIZE	64
> >>>
> >>>  bool is_amd;
> >>> +bool sub_numa_cluster_enable;
> >>>
> >>>  void detect_amd(void)
> >>>  {
> >>> @@ -34,6 +37,35 @@ void detect_amd(void)
> >>>  	fclose(inf);
> >>>  }
> >>>
> >>> +void check_sub_numa_cluster(void)
> >>> +{
> >>> +	FILE *inf = fopen("/proc/cpuinfo", "r");
> >>> +	char *res, *s;
> >>> +	int socket_num = 0;
> >>> +	int numa_nodes = 0;
> >>> +
> >>> +	if (!inf)
> >>> +		return;
> >>> +
> >>> +	res = fgrep_last_match_line(inf, "physical id");
> >>> +
> >>> +	if (res) {
> >>> +		s = strpbrk(res, "1234567890");
> >>> +		socket_num = atoi(s) + 1;
> >>> +		free(res);
> >>> +	}
> >>> +	fclose(inf);
> >>> +
> >>> +	numa_nodes = numa_max_node() + 1;
> >>> +
> >>> +	/*
> >>> +	 * when the Sub-NUMA Clustering(SNC) feature is enabled,
> >>> +	 * the number of numa nodes is twice the number of sockets.
> >>> +	 */
> >>> +	if (numa_nodes == (2 * socket_num))
> >>> +		sub_numa_cluster_enable = true;
> >>> +}
> >>
> >>
> >> Unfortunately there does not seem to be an architectural way to
> >> detect if SNC has been enabled and the above test is fragile wrt the
> >> assumptions about the topology of the system. What we need is a
> >> reliable and future-proof test but I do not know what that should be.
> >
> > I understand your concerns.
> > At least I know SNC affects on the 2nd Generation Intel Xeon.
> > So, how about just printing information when the test is running on 2nd
> Generation Intel Xeon?
> > That is, when the result of MBM&CMT test is "not ok"
> > and if running cpu model is 2nd Generation Intel Xeon, then print
> > information about the possibility of failure (SNC may be enabled).
> > How about this idea?
> 
> I think that making this model specific would be hard to get right for all systems
> and hard to maintain. Perhaps we could just print a generic message on failure?
> Something
> like: "Intel CMT and MBM counters may be inaccurate when Sub-NUMA
> Clustering (SNC) is enabled. Ensure SNC is disabled in the BIOS if this system
> supports SNC." I'd be the first to admit that this is not ideal and would
> appreciate suggestions for improvement. Unfortunately we seem to lack a
> reliable and future proof way to detect if SNC is enabled but I do look forward to
> being corrected.

I can't think out a better idea, I am going to use your idea.
If MBM&CMT test is running on Intel cpu and the result of MBM&CMT test is "not ok",
then print a message:
"Intel CMT and MBM counters may be inaccurate when Sub-NUMA Clustering (SNC) is enabled. 
Ensure SNC is disabled in the BIOS if this system supports SNC."


Best regards,
Tan Shaopeng





[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux