Re: [PATCH v3 2/2] selftests/resctrl: Adjust SNC support messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Maciej,

On 7/4/24 12:23 AM, Maciej Wieczor-Retman wrote:
On 2024-07-03 at 13:51:03 -0700, Reinette Chatre wrote:
On 7/3/24 12:43 AM, Maciej Wieczór-Retman wrote:
On 3.07.2024 00:21, Reinette Chatre wrote:
On 7/1/24 7:18 AM, Maciej Wieczor-Retman wrote:

...

SNC might not be enabled at all so there would be no reason to send the user
to check their BIOS - instead they can learn they have offline CPUs and they can
work on fixing that. In my opinion it could be beneficial to have more specialized
messages in the selftests to help users diagnose problems quicker.

My goal is indeed to have specialized messages. There cannot be a specialized message
if snc_reliable == 1. In this case it needs to be generic since SNC may or may not be
enabled and it is up to the user to investigate further.

How about this in cmt_run_test() for example:

	if (snc_unreliable)
		ksft_print_msg("Intel CMT may be inaccurate or inefficient when Sub-NUMA Clustering is enabled and not properly detected.\n");

It is ok with me if you want to keep the message even if the test succeeds. Without seeing
the new implementation it is unclear to me why the SNC check below is guarded by an ARCH_INTEL
check while the one above is not. Ideally this should be consistent to not confuse how
the architectures need to be treated here.

The message does sound a bit vague to me about being able to detect SNC. How about something
like:
	Sub-NUMA Clustering could not be detected properly (see earlier messages for details).
	Intel CMT may be inaccurate.


	else if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support())
		ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled. Check BIOS configuration.\n");

The "Check BIOS configuration" guidance is not clear to me. If the kernel does not
support SNC then the user could also be guided to upgrade their kernel? Perhaps that
snippet can just be dropped? To be more specific that SNC enabling is not a kernel
configuration but a system configuration this can read (please feel free to improve):
	Kernel doesn't support Sub-NUMA Clustering but it is enabled on the system.
This way there is a generic message when snc_unreliable == 1.

And as you mentioned at the end of this email, the user can be expected to
backtrack to the beginning of the test if there are any problems so they can
discover the exact source of the issue - offline cpus.



Having only this one message wihtout the "if snc unreliable" messages would
mean nothing would get printed at the end on success with unreliable SNC detection.

Having a pass/fail is what user will focus on. If the test passes then SNC detection
should not matter. The messages are just there to help user root cause where a failure
may be.

My train of thought was that if test passes with broken SNC detection it means
SNC was used inefficiently right? (either the portion of L3 used was bigger or
smaller than that allocated for one cluster)

It's not exactly a failure but I thought it deserves a warning at the very end
to alert the user.

If you don't think the warning should be printed on success I guess the
condition can be:
	if (ret && snc_unreliable)
and the user can look at the start of the test if they care about more
information. And the message can lose the "inefficient" word since it would only
happen on error.


Reinette




[Index of Archives]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux