On Fri, 2025-01-24 at 11:53 +0800, zhenglifeng (A) wrote: > On 2025/1/24 1:05, Mario Limonciello wrote: > > > On 1/23/2025 10:46, Srinivas Pandruvada wrote: > > > > > > On 1/20/25 18:42, zhenglifeng (A) wrote: > > > > On 2025/1/21 1:44, Mario Limonciello wrote: > > > > > > > > > On 1/20/2025 08:49, Pierre Gondois wrote: > > > > > > > > > > > > On 1/20/25 04:15, zhenglifeng (A) wrote: > > > > > > > On 2025/1/17 22:30, Mario Limonciello wrote: > > > > > > > > > > > > > > > On 1/16/2025 21:11, zhenglifeng (A) wrote: > > > > > > > > > On 2025/1/16 19:39, Russell Haley wrote: > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > I noticed something here just as a user casually > > > > > > > > > > browsing the mailing list. > > > > > > > > > > > > > > > > > > > > On 1/13/25 6:21 AM, Lifeng Zheng wrote: > > > > > > > > > > > Add sysfs interfaces for CPPC autonomous > > > > > > > > > > > selection in the cppc_cpufreq > > > > > > > > > > > driver. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Lifeng Zheng > > > > > > > > > > > <zhenglifeng1@xxxxxxxxxx> > > > > > > > > > > > --- > > > > > > > > > > > .../ABI/testing/sysfs-devices-system-cpu > > > > > > > > > > > | 54 +++++++++ > > > > > > > > > > > drivers/cpufreq/cppc_cpufreq.c > > > > > > > > > > > | 109 +++++++ ++++ +++++++ > > > > > > > > > > > 2 files changed, 163 insertions(+) > > > > > > > > > > > > > > > > > > > > > > diff --git a/Documentation/ABI/testing/sysfs- > > > > > > > > > > > devices-system-cpu b/ > > > > > > > > > > > Documentation/ABI/testing/sysfs-devices-system- > > > > > > > > > > > cpu > > > > > > > > > > > index 206079d3bd5b..3d87c3bb3fe2 100644 > > > > > > > > > > > --- a/Documentation/ABI/testing/sysfs-devices- > > > > > > > > > > > system-cpu > > > > > > > > > > > +++ b/Documentation/ABI/testing/sysfs-devices- > > > > > > > > > > > system-cpu > > > > > > > > > > > @@ -268,6 +268,60 @@ Description: Discover > > > > > > > > > > > CPUs in the same CPU frequency coordination > > > > > > > > > > > domain > > > > > > > > > > > This file is only present if the > > > > > > > > > > > acpi-cpufreq or the cppc-cpufreq > > > > > > > > > > > drivers are in use. > > > > > > > > > > [...snip...] > > > > > > > > > > > > > > > > > > > > > +What: > > > > > > > > > > > /sys/devices/system/cpu/cpuX/cpufreq/energy_perf > > > > > > > > > > > +Date: October 2024 > > > > > > > > > > > +Contact: linux-pm@xxxxxxxxxxxxxxx > > > > > > > > > > > +Description: Energy performance preference > > > > > > > > > > > + > > > > > > > > > > > + Read/write an 8-bit integer from/to this > > > > > > > > > > > file. This file > > > > > > > > > > > + represents a range of values from 0 > > > > > > > > > > > (performance preference) to > > > > > > > > > > > + 0xFF (energy efficiency preference) that > > > > > > > > > > > influences the rate of > > > > > > > > > > > + performance increase/decrease and the > > > > > > > > > > > result of the hardware's > > > > > > > > > > > + energy efficiency and performance > > > > > > > > > > > optimization policies. > > > > > > > > > > > + > > > > > > > > > > > + Writing to this file only has meaning > > > > > > > > > > > when Autonomous Selection is > > > > > > > > > > > + enabled. > > > > > > > > > > > + > > > > > > > > > > > + This file only presents if the cppc- > > > > > > > > > > > cpufreq driver is in use. > > > > > > > > > > In intel_pstate driver, there is file with near- > > > > > > > > > > identical semantics: > > > > > > > > > > > > > > > > > > > > /sys/devices/system/cpu/cpuX/cpufreq/energy_perform > > > > > > > > > > ance_preference > > > > > > > > > > > > > > > > > > > > It also accepts a few string arguments and converts > > > > > > > > > > them to integers. > > > > > > > > > > > > > > > > > > > > Perhaps the same name should be used, and the > > > > > > > > > > semantics made exactly > > > > > > > > > > identical, and then it could be documented as > > > > > > > > > > present for either > > > > > > > > > > cppc_cpufreq OR intel_pstate? > > > > > > > > > > > > > > > > > > > > I think would be more elegant if userspace tooling > > > > > > > > > > could Just Work with > > > > > > > > > > either driver. > > > > > > > > > > > > > > > > > > > > One might object that the frequency selection > > > > > > > > > > behavior that results from > > > > > > > > > > any particular value of the register itself might > > > > > > > > > > be different, but they > > > > > > > > > > are *already* different between Intel's P and E- > > > > > > > > > > cores in the same CPU > > > > > > > > > > package. (Ugh.) > > > > > > > > > Yes, I should use the same name. Thanks. > > > > > > > > > > > > > > > > > > As for accepting string arguments and converting them > > > > > > > > > to integers, I don't > > > > > > > > > think it is necessary. It'll be a litte confused if > > > > > > > > > someone writes a raw > > > > > > > > > value and reads a string I think. I prefer to let > > > > > > > > > users freely set this > > > > > > > > > value. > > > > > > > > > > > > > > > > > > In addition, there are many differences between the > > > > > > > > > implementations of > > > > > > > > > energy_performance_preference in intel_pstate and > > > > > > > > > cppc_cpufreq (and > > > > > > > > > amd-pstate...). It is really difficult to explain all > > > > > > > > > this differences in > > > > > > > > > this document. So I'll leave it to be documented as > > > > > > > > > present for > > > > > > > > > cppc_cpufreq only. > > > > > > > > At least the interface to userspace I think we should > > > > > > > > do the best we can to be the same between all the > > > > > > > > drivers if possible. > > > > > > > > > > > > > > > > For example; I've got a patch that I may bring up in a > > > > > > > > future kernel cycle that adds raw integer writes to > > > > > > > > amd-pstates energy_performance_profile to behave the > > > > > > > > same way intel-pstate does. > > > > > > > I agree that it's better to keep this interface > > > > > > > consistent across different > > > > > > > drivers. But in my opinion, the implementation of > > > > > > > intel_pstate > > > > > > > energy_performance_preference is not really nice. Someone > > > > > > > may write a raw > > > > > > > value but read a string, or read strings for some values > > > > > > > and read raw > > > > > > > values for some other values. It is inconsistent. It may > > > > > > > be better to use > > > > > > > some other implementation, such as seperating the > > > > > > > operations of r/w strings > > > > > > > and raw values into two files. > > > > > > I agree it would be better to be sure of the type to expect > > > > > > when reading the > > > > > > energy_performance_preference file. The epp values in the > > > > > > range 0-255 with 0 > > > > > > being the performance value for all interfaces. > > > > > > > > > > > > In the current epp strings, it seems there is a big gap > > > > > > between the PERFORMANCE > > > > > > and the BALANCE_PERFORMANCE strings. Maybe it would be good > > > > > > to complete it: > > > > > > EPP_PERFORMANCE 0x00 > > > > > > EPP_BALANCE_PERFORMANCE 0x40 // state value changed > > > > > > EPP_BALANCE 0x80 // new state > > > > > > EPP_BALANCE_POWERSAVE 0xC0 > > > > > > EPP_POWERSAVE 0xFF > > > > > > > > > > > > NIT: The mapping seems to be slightly different for > > > > > > intel_pstate and amd-pstate > > > > > > currently: > > > > > > drivers/cpufreq/amd-pstate.c > > > > > > #define AMD_CPPC_EPP_PERFORMANCE 0x00 > > > > > > #define AMD_CPPC_EPP_BALANCE_PERFORMANCE 0x80 > > > > > > #define AMD_CPPC_EPP_BALANCE_POWERSAVE 0xBF > > > > > > #define AMD_CPPC_EPP_POWERSAVE 0xFF > > > > > > > > > > > > arch/x86/include/asm/msr-index.h > > > > > > #define HWP_EPP_PERFORMANCE 0x00 > > > > > > #define HWP_EPP_BALANCE_PERFORMANCE 0x80 > > > > > > #define HWP_EPP_BALANCE_POWERSAVE 0xC0 <------ > > > > > > Different from AMD_CPPC_EPP_BALANCE_POWERSAVE > > > > > > #define HWP_EPP_POWERSAVE 0xFF > > > > > > > > > > > > > I think it's better to consult Rafael and Viresh about > > > > > > > how this should > > > > > > > evolve. > > > > > > Yes indeed > > > > > Maybe it's best to discuss what the goal of raw EPP number > > > > > writes is to decide what to do with it. > > > > > > > > > > IE in intel-pstate is it for userspace to be able to actually > > > > > utilize something besides the strings all the time? Or is it > > > > > just for debugging to find better values for strings in the > > > > > future? > > > > > > > > > > If the former maybe we're better off splitting to > > > > > 'energy_performance_preference' and > > > > > 'energy_performance_preference_int'. > > > > > > > > > > If the latter maybe we're better off putting the integer > > > > > writes and reads into debugfs instead and making > > > > > 'energy_performance_preference' return -EINVAL while a non- > > > > > predefined value is in use. > > > > > > In Intel case EPP values can be different based on processor. In > > > some case they they end up sharing the same CPU model. So strings > > > are not suitable for all cases. Also there is different > > > preference of EPP between Chrome systems and non chrome distro. > > > For example Chrome has some resource manager which can change and > > > same on Intel distros with LPMD. > > > > > > > Thanks for confirming it is intentional and changing it would break > > existing userspace. > > > > And FWIW even in Windows there are more than 4 situational values > > used like we have in Linux today. > > > > As the status quo is there I personally feel that we should do the > > exact same for other implementation of > > energy_performance_preference. > > I still don't think this implementation is nice, for the following > reasons: > > 1. Users may write raw value but read string. It's odd. > > 2. Sometimes a raw value is read and sometimes a character string is > read. > The userspace tool needs to adapt this. > > 3. Reading and writing EPP strings is not really general in driver. > It is > more reasonable to use the userspace tool to implement it. > > In order not to break existing userspace, I'll rename the file to > 'energy_performance_preference_int' or > 'energy_performance_preference_val' > in cppc_cpufreq and only support reading and writing raw value. As > for > accepting string arguments, it's not necessary for cppc_cpufreq for > now. > It's easy to add this feature but hard to remove, so I'll leave it to > the > future if it is really needed. > > As for amd-pstate and intel_pstate, you can decide how > energy_performance_preference should evolve. But I strongly recommend > splitting it. No. User space can deal with this already. Atleast this has one interface. If you split you need to keep them consistent. You can write a raw value to new attribute and then can read a string from the other attribute, which means different. This is not the only place where strings and raw values can be written in sysfs. Also true for energy_perf_bias. Thanks, Srinivas > > Regards, > > Lifeng > > > > > > > > > Thanks, > > > > > > Srinivas > > > > > > > > > > I think it's the former. > > > > > > > > I added the author of the patch that allows raw energy > > > > performance > > > > preference value in intel_pstate to ask about what the goal is > > > > and if this > > > > would be ok to do the modification mentioned above. > > > > > > > > To see the patch from > > > > https://lore.kernel.org/ all/20200626183401.1495090-3-srinivas.pandruvada@linux.intel.c > > > > om/ > > > > > > > > Anyway, the purpose of this patch is to allow users write and > > > > read raw EPP > > > > number. So maybe I can just rename the file to > > > > 'energy_performance_preference_int'? > > > > > > >