On 1/23/2025 10:46, Srinivas Pandruvada wrote:
On 1/20/25 18:42, zhenglifeng (A) wrote:
On 2025/1/21 1:44, Mario Limonciello wrote:
On 1/20/2025 08:49, Pierre Gondois wrote:
On 1/20/25 04:15, zhenglifeng (A) wrote:
On 2025/1/17 22:30, Mario Limonciello wrote:
On 1/16/2025 21:11, zhenglifeng (A) wrote:
On 2025/1/16 19:39, Russell Haley wrote:
Hello,
I noticed something here just as a user casually browsing the
mailing list.
On 1/13/25 6:21 AM, Lifeng Zheng wrote:
Add sysfs interfaces for CPPC autonomous selection in the
cppc_cpufreq
driver.
Signed-off-by: Lifeng Zheng <zhenglifeng1@xxxxxxxxxx>
---
.../ABI/testing/sysfs-devices-system-cpu | 54 +++++++++
drivers/cpufreq/cppc_cpufreq.c | 109 +++++++
++++ +++++++
2 files changed, 163 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu
b/ Documentation/ABI/testing/sysfs-devices-system-cpu
index 206079d3bd5b..3d87c3bb3fe2 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -268,6 +268,60 @@ Description: Discover CPUs in the same
CPU frequency coordination domain
This file is only present if the acpi-cpufreq or
the cppc-cpufreq
drivers are in use.
[...snip...]
+What: /sys/devices/system/cpu/cpuX/cpufreq/energy_perf
+Date: October 2024
+Contact: linux-pm@xxxxxxxxxxxxxxx
+Description: Energy performance preference
+
+ Read/write an 8-bit integer from/to this file. This file
+ represents a range of values from 0 (performance
preference) to
+ 0xFF (energy efficiency preference) that influences
the rate of
+ performance increase/decrease and the result of the
hardware's
+ energy efficiency and performance optimization policies.
+
+ Writing to this file only has meaning when Autonomous
Selection is
+ enabled.
+
+ This file only presents if the cppc-cpufreq driver is
in use.
In intel_pstate driver, there is file with near-identical
semantics:
/sys/devices/system/cpu/cpuX/cpufreq/energy_performance_preference
It also accepts a few string arguments and converts them to
integers.
Perhaps the same name should be used, and the semantics made
exactly
identical, and then it could be documented as present for either
cppc_cpufreq OR intel_pstate?
I think would be more elegant if userspace tooling could Just
Work with
either driver.
One might object that the frequency selection behavior that
results from
any particular value of the register itself might be different,
but they
are *already* different between Intel's P and E-cores in the
same CPU
package. (Ugh.)
Yes, I should use the same name. Thanks.
As for accepting string arguments and converting them to
integers, I don't
think it is necessary. It'll be a litte confused if someone
writes a raw
value and reads a string I think. I prefer to let users freely
set this
value.
In addition, there are many differences between the
implementations of
energy_performance_preference in intel_pstate and cppc_cpufreq (and
amd-pstate...). It is really difficult to explain all this
differences in
this document. So I'll leave it to be documented as present for
cppc_cpufreq only.
At least the interface to userspace I think we should do the best
we can to be the same between all the drivers if possible.
For example; I've got a patch that I may bring up in a future
kernel cycle that adds raw integer writes to amd-pstates
energy_performance_profile to behave the same way intel-pstate does.
I agree that it's better to keep this interface consistent across
different
drivers. But in my opinion, the implementation of intel_pstate
energy_performance_preference is not really nice. Someone may write
a raw
value but read a string, or read strings for some values and read raw
values for some other values. It is inconsistent. It may be better
to use
some other implementation, such as seperating the operations of r/w
strings
and raw values into two files.
I agree it would be better to be sure of the type to expect when
reading the
energy_performance_preference file. The epp values in the range
0-255 with 0
being the performance value for all interfaces.
In the current epp strings, it seems there is a big gap between the
PERFORMANCE
and the BALANCE_PERFORMANCE strings. Maybe it would be good to
complete it:
EPP_PERFORMANCE 0x00
EPP_BALANCE_PERFORMANCE 0x40 // state value changed
EPP_BALANCE 0x80 // new state
EPP_BALANCE_POWERSAVE 0xC0
EPP_POWERSAVE 0xFF
NIT: The mapping seems to be slightly different for intel_pstate and
amd-pstate
currently:
drivers/cpufreq/amd-pstate.c
#define AMD_CPPC_EPP_PERFORMANCE 0x00
#define AMD_CPPC_EPP_BALANCE_PERFORMANCE 0x80
#define AMD_CPPC_EPP_BALANCE_POWERSAVE 0xBF
#define AMD_CPPC_EPP_POWERSAVE 0xFF
arch/x86/include/asm/msr-index.h
#define HWP_EPP_PERFORMANCE 0x00
#define HWP_EPP_BALANCE_PERFORMANCE 0x80
#define HWP_EPP_BALANCE_POWERSAVE 0xC0 <------ Different from
AMD_CPPC_EPP_BALANCE_POWERSAVE
#define HWP_EPP_POWERSAVE 0xFF
I think it's better to consult Rafael and Viresh about how this should
evolve.
Yes indeed
Maybe it's best to discuss what the goal of raw EPP number writes is
to decide what to do with it.
IE in intel-pstate is it for userspace to be able to actually utilize
something besides the strings all the time? Or is it just for
debugging to find better values for strings in the future?
If the former maybe we're better off splitting to
'energy_performance_preference' and 'energy_performance_preference_int'.
If the latter maybe we're better off putting the integer writes and
reads into debugfs instead and making 'energy_performance_preference'
return -EINVAL while a non-predefined value is in use.
In Intel case EPP values can be different based on processor. In some
case they they end up sharing the same CPU model. So strings are not
suitable for all cases. Also there is different preference of EPP
between Chrome systems and non chrome distro. For example Chrome has
some resource manager which can change and same on Intel distros with LPMD.
Thanks for confirming it is intentional and changing it would break
existing userspace.
And FWIW even in Windows there are more than 4 situational values used
like we have in Linux today.
As the status quo is there I personally feel that we should do the exact
same for other implementation of energy_performance_preference.
Thanks,
Srinivas
I think it's the former.
I added the author of the patch that allows raw energy performance
preference value in intel_pstate to ask about what the goal is and if
this
would be ok to do the modification mentioned above.
To see the patch from https://lore.kernel.org/
all/20200626183401.1495090-3-srinivas.pandruvada@xxxxxxxxxxxxxxx/
Anyway, the purpose of this patch is to allow users write and read raw
EPP
number. So maybe I can just rename the file to
'energy_performance_preference_int'?