Hello, Sorry for my delay due to the recent holiday. On 2025/1/24 22:18, srinivas pandruvada wrote: > On Fri, 2025-01-24 at 11:53 +0800, zhenglifeng (A) wrote: >> On 2025/1/24 1:05, Mario Limonciello wrote: >> >>> On 1/23/2025 10:46, Srinivas Pandruvada wrote: >>>> >>>> On 1/20/25 18:42, zhenglifeng (A) wrote: >>>>> On 2025/1/21 1:44, Mario Limonciello wrote: >>>>> >>>>>> On 1/20/2025 08:49, Pierre Gondois wrote: >>>>>>> >>>>>>> On 1/20/25 04:15, zhenglifeng (A) wrote: >>>>>>>> On 2025/1/17 22:30, Mario Limonciello wrote: >>>>>>>> >>>>>>>>> On 1/16/2025 21:11, zhenglifeng (A) wrote: >>>>>>>>>> On 2025/1/16 19:39, Russell Haley wrote: >>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I noticed something here just as a user casually >>>>>>>>>>> browsing the mailing list. >>>>>>>>>>> >>>>>>>>>>> On 1/13/25 6:21 AM, Lifeng Zheng wrote: >>>>>>>>>>>> Add sysfs interfaces for CPPC autonomous >>>>>>>>>>>> selection in the cppc_cpufreq >>>>>>>>>>>> driver. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Lifeng Zheng >>>>>>>>>>>> <zhenglifeng1@xxxxxxxxxx> >>>>>>>>>>>> --- >>>>>>>>>>>> .../ABI/testing/sysfs-devices-system-cpu >>>>>>>>>>>> | 54 +++++++++ >>>>>>>>>>>> drivers/cpufreq/cppc_cpufreq.c >>>>>>>>>>>> | 109 +++++++ ++++ +++++++ >>>>>>>>>>>> 2 files changed, 163 insertions(+) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/Documentation/ABI/testing/sysfs- >>>>>>>>>>>> devices-system-cpu b/ >>>>>>>>>>>> Documentation/ABI/testing/sysfs-devices-system- >>>>>>>>>>>> cpu >>>>>>>>>>>> index 206079d3bd5b..3d87c3bb3fe2 100644 >>>>>>>>>>>> --- a/Documentation/ABI/testing/sysfs-devices- >>>>>>>>>>>> system-cpu >>>>>>>>>>>> +++ b/Documentation/ABI/testing/sysfs-devices- >>>>>>>>>>>> system-cpu >>>>>>>>>>>> @@ -268,6 +268,60 @@ Description: Discover >>>>>>>>>>>> CPUs in the same CPU frequency coordination >>>>>>>>>>>> domain >>>>>>>>>>>> This file is only present if the >>>>>>>>>>>> acpi-cpufreq or the cppc-cpufreq >>>>>>>>>>>> drivers are in use. >>>>>>>>>>> [...snip...] >>>>>>>>>>> >>>>>>>>>>>> +What: >>>>>>>>>>>> /sys/devices/system/cpu/cpuX/cpufreq/energy_perf >>>>>>>>>>>> +Date: October 2024 >>>>>>>>>>>> +Contact: linux-pm@xxxxxxxxxxxxxxx >>>>>>>>>>>> +Description: Energy performance preference >>>>>>>>>>>> + >>>>>>>>>>>> + Read/write an 8-bit integer from/to this >>>>>>>>>>>> file. This file >>>>>>>>>>>> + represents a range of values from 0 >>>>>>>>>>>> (performance preference) to >>>>>>>>>>>> + 0xFF (energy efficiency preference) that >>>>>>>>>>>> influences the rate of >>>>>>>>>>>> + performance increase/decrease and the >>>>>>>>>>>> result of the hardware's >>>>>>>>>>>> + energy efficiency and performance >>>>>>>>>>>> optimization policies. >>>>>>>>>>>> + >>>>>>>>>>>> + Writing to this file only has meaning >>>>>>>>>>>> when Autonomous Selection is >>>>>>>>>>>> + enabled. >>>>>>>>>>>> + >>>>>>>>>>>> + This file only presents if the cppc- >>>>>>>>>>>> cpufreq driver is in use. >>>>>>>>>>> In intel_pstate driver, there is file with near- >>>>>>>>>>> identical semantics: >>>>>>>>>>> >>>>>>>>>>> /sys/devices/system/cpu/cpuX/cpufreq/energy_perform >>>>>>>>>>> ance_preference >>>>>>>>>>> >>>>>>>>>>> It also accepts a few string arguments and converts >>>>>>>>>>> them to integers. >>>>>>>>>>> >>>>>>>>>>> Perhaps the same name should be used, and the >>>>>>>>>>> semantics made exactly >>>>>>>>>>> identical, and then it could be documented as >>>>>>>>>>> present for either >>>>>>>>>>> cppc_cpufreq OR intel_pstate? >>>>>>>>>>> >>>>>>>>>>> I think would be more elegant if userspace tooling >>>>>>>>>>> could Just Work with >>>>>>>>>>> either driver. >>>>>>>>>>> >>>>>>>>>>> One might object that the frequency selection >>>>>>>>>>> behavior that results from >>>>>>>>>>> any particular value of the register itself might >>>>>>>>>>> be different, but they >>>>>>>>>>> are *already* different between Intel's P and E- >>>>>>>>>>> cores in the same CPU >>>>>>>>>>> package. (Ugh.) >>>>>>>>>> Yes, I should use the same name. Thanks. >>>>>>>>>> >>>>>>>>>> As for accepting string arguments and converting them >>>>>>>>>> to integers, I don't >>>>>>>>>> think it is necessary. It'll be a litte confused if >>>>>>>>>> someone writes a raw >>>>>>>>>> value and reads a string I think. I prefer to let >>>>>>>>>> users freely set this >>>>>>>>>> value. >>>>>>>>>> >>>>>>>>>> In addition, there are many differences between the >>>>>>>>>> implementations of >>>>>>>>>> energy_performance_preference in intel_pstate and >>>>>>>>>> cppc_cpufreq (and >>>>>>>>>> amd-pstate...). It is really difficult to explain all >>>>>>>>>> this differences in >>>>>>>>>> this document. So I'll leave it to be documented as >>>>>>>>>> present for >>>>>>>>>> cppc_cpufreq only. >>>>>>>>> At least the interface to userspace I think we should >>>>>>>>> do the best we can to be the same between all the >>>>>>>>> drivers if possible. >>>>>>>>> >>>>>>>>> For example; I've got a patch that I may bring up in a >>>>>>>>> future kernel cycle that adds raw integer writes to >>>>>>>>> amd-pstates energy_performance_profile to behave the >>>>>>>>> same way intel-pstate does. >>>>>>>> I agree that it's better to keep this interface >>>>>>>> consistent across different >>>>>>>> drivers. But in my opinion, the implementation of >>>>>>>> intel_pstate >>>>>>>> energy_performance_preference is not really nice. Someone >>>>>>>> may write a raw >>>>>>>> value but read a string, or read strings for some values >>>>>>>> and read raw >>>>>>>> values for some other values. It is inconsistent. It may >>>>>>>> be better to use >>>>>>>> some other implementation, such as seperating the >>>>>>>> operations of r/w strings >>>>>>>> and raw values into two files. >>>>>>> I agree it would be better to be sure of the type to expect >>>>>>> when reading the >>>>>>> energy_performance_preference file. The epp values in the >>>>>>> range 0-255 with 0 >>>>>>> being the performance value for all interfaces. >>>>>>> >>>>>>> In the current epp strings, it seems there is a big gap >>>>>>> between the PERFORMANCE >>>>>>> and the BALANCE_PERFORMANCE strings. Maybe it would be good >>>>>>> to complete it: >>>>>>> EPP_PERFORMANCE 0x00 >>>>>>> EPP_BALANCE_PERFORMANCE 0x40 // state value changed >>>>>>> EPP_BALANCE 0x80 // new state >>>>>>> EPP_BALANCE_POWERSAVE 0xC0 >>>>>>> EPP_POWERSAVE 0xFF >>>>>>> >>>>>>> NIT: The mapping seems to be slightly different for >>>>>>> intel_pstate and amd-pstate >>>>>>> currently: >>>>>>> drivers/cpufreq/amd-pstate.c >>>>>>> #define AMD_CPPC_EPP_PERFORMANCE 0x00 >>>>>>> #define AMD_CPPC_EPP_BALANCE_PERFORMANCE 0x80 >>>>>>> #define AMD_CPPC_EPP_BALANCE_POWERSAVE 0xBF >>>>>>> #define AMD_CPPC_EPP_POWERSAVE 0xFF >>>>>>> >>>>>>> arch/x86/include/asm/msr-index.h >>>>>>> #define HWP_EPP_PERFORMANCE 0x00 >>>>>>> #define HWP_EPP_BALANCE_PERFORMANCE 0x80 >>>>>>> #define HWP_EPP_BALANCE_POWERSAVE 0xC0 <------ >>>>>>> Different from AMD_CPPC_EPP_BALANCE_POWERSAVE >>>>>>> #define HWP_EPP_POWERSAVE 0xFF >>>>>>> >>>>>>>> I think it's better to consult Rafael and Viresh about >>>>>>>> how this should >>>>>>>> evolve. >>>>>>> Yes indeed >>>>>> Maybe it's best to discuss what the goal of raw EPP number >>>>>> writes is to decide what to do with it. >>>>>> >>>>>> IE in intel-pstate is it for userspace to be able to actually >>>>>> utilize something besides the strings all the time? Or is it >>>>>> just for debugging to find better values for strings in the >>>>>> future? >>>>>> >>>>>> If the former maybe we're better off splitting to >>>>>> 'energy_performance_preference' and >>>>>> 'energy_performance_preference_int'. >>>>>> >>>>>> If the latter maybe we're better off putting the integer >>>>>> writes and reads into debugfs instead and making >>>>>> 'energy_performance_preference' return -EINVAL while a non- >>>>>> predefined value is in use. >>>> >>>> In Intel case EPP values can be different based on processor. In >>>> some case they they end up sharing the same CPU model. So strings >>>> are not suitable for all cases. Also there is different >>>> preference of EPP between Chrome systems and non chrome distro. >>>> For example Chrome has some resource manager which can change and >>>> same on Intel distros with LPMD. >>>> >>> >>> Thanks for confirming it is intentional and changing it would break >>> existing userspace. >>> >>> And FWIW even in Windows there are more than 4 situational values >>> used like we have in Linux today. >>> >>> As the status quo is there I personally feel that we should do the >>> exact same for other implementation of >>> energy_performance_preference. >> >> I still don't think this implementation is nice, for the following >> reasons: >> >> 1. Users may write raw value but read string. It's odd. >> >> 2. Sometimes a raw value is read and sometimes a character string is >> read. >> The userspace tool needs to adapt this. >> >> 3. Reading and writing EPP strings is not really general in driver. >> It is >> more reasonable to use the userspace tool to implement it. >> >> In order not to break existing userspace, I'll rename the file to >> 'energy_performance_preference_int' or >> 'energy_performance_preference_val' >> in cppc_cpufreq and only support reading and writing raw value. As >> for >> accepting string arguments, it's not necessary for cppc_cpufreq for >> now. >> It's easy to add this feature but hard to remove, so I'll leave it to >> the >> future if it is really needed. >> >> As for amd-pstate and intel_pstate, you can decide how >> energy_performance_preference should evolve. But I strongly recommend >> splitting it. > > No. User space can deal with this already. I know that. What I mean is that user space had to do more to adapt this "sometims value sometimes string" situation. This could have been avoided. > Atleast this has one > interface. If you split you need to keep them consistent. You can write > a raw value to new attribute and then can read a string from the other > attribute, which means different. I won't do any change in amd-pstate and intel_pstate because it involves too much. I'll only add a new file in cppc_cpufreq for reading and writing EPP raw value. > > This is not the only place where strings and raw values can be written > in sysfs. Also true for energy_perf_bias. > > > Thanks, > Srinivas > > > >> >> Regards, >> >> Lifeng >> >>> >>>> >>>> Thanks, >>>> >>>> Srinivas >>>> >>>> >>>>> I think it's the former. >>>>> >>>>> I added the author of the patch that allows raw energy >>>>> performance >>>>> preference value in intel_pstate to ask about what the goal is >>>>> and if this >>>>> would be ok to do the modification mentioned above. >>>>> >>>>> To see the patch from >>>>> https://lore.kernel.org/ all/20200626183401.1495090-3-srinivas.pandruvada@linux.intel.c >>>>> om/ >>>>> >>>>> Anyway, the purpose of this patch is to allow users write and >>>>> read raw EPP >>>>> number. So maybe I can just rename the file to >>>>> 'energy_performance_preference_int'? >>>>> >>> >> > >