Re: [PATCH v4 6/6] cpufreq: CPPC: Support for autonomous selection in cppc_cpufreq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 1/23/25 9:53 PM, zhenglifeng (A) wrote:
> On 2025/1/24 1:05, Mario Limonciello wrote:
> 
>> On 1/23/2025 10:46, Srinivas Pandruvada wrote:
>>>
>>> On 1/20/25 18:42, zhenglifeng (A) wrote:
>>>> On 2025/1/21 1:44, Mario Limonciello wrote:
>>>>
>>>>> On 1/20/2025 08:49, Pierre Gondois wrote:
>>>>>>
>>>>>> On 1/20/25 04:15, zhenglifeng (A) wrote:
>>>>>>> On 2025/1/17 22:30, Mario Limonciello wrote:
>>>>>>>
>>>>>>>> On 1/16/2025 21:11, zhenglifeng (A) wrote:
>>>>>>>>> On 2025/1/16 19:39, Russell Haley wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I noticed something here just as a user casually browsing the mailing list.
>>>>>>>>>>
>>>>>>>>>> On 1/13/25 6:21 AM, Lifeng Zheng wrote:
>>>>>>>>>>> Add sysfs interfaces for CPPC autonomous selection in the cppc_cpufreq
>>>>>>>>>>> driver.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Lifeng Zheng <zhenglifeng1@xxxxxxxxxx>
>>>>>>>>>>> ---
>>>>>>>>>>>     .../ABI/testing/sysfs-devices-system-cpu      |  54 +++++++++
>>>>>>>>>>>     drivers/cpufreq/cppc_cpufreq.c                | 109 +++++++ ++++ +++++++
>>>>>>>>>>>     2 files changed, 163 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/ Documentation/ABI/testing/sysfs-devices-system-cpu
>>>>>>>>>>> index 206079d3bd5b..3d87c3bb3fe2 100644
>>>>>>>>>>> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
>>>>>>>>>>> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
>>>>>>>>>>> @@ -268,6 +268,60 @@ Description:    Discover CPUs in the same CPU frequency coordination domain
>>>>>>>>>>>             This file is only present if the acpi-cpufreq or the cppc-cpufreq
>>>>>>>>>>>             drivers are in use.
>>>>>>>>>> [...snip...]
>>>>>>>>>>
>>>>>>>>>>> +What:        /sys/devices/system/cpu/cpuX/cpufreq/energy_perf
>>>>>>>>>>> +Date:        October 2024
>>>>>>>>>>> +Contact:    linux-pm@xxxxxxxxxxxxxxx
>>>>>>>>>>> +Description:    Energy performance preference
>>>>>>>>>>> +
>>>>>>>>>>> +        Read/write an 8-bit integer from/to this file. This file
>>>>>>>>>>> +        represents a range of values from 0 (performance preference) to
>>>>>>>>>>> +        0xFF (energy efficiency preference) that influences the rate of
>>>>>>>>>>> +        performance increase/decrease and the result of the hardware's
>>>>>>>>>>> +        energy efficiency and performance optimization policies.
>>>>>>>>>>> +
>>>>>>>>>>> +        Writing to this file only has meaning when Autonomous Selection is
>>>>>>>>>>> +        enabled.
>>>>>>>>>>> +
>>>>>>>>>>> +        This file only presents if the cppc-cpufreq driver is in use.
>>>>>>>>>> In intel_pstate driver, there is file with near-identical semantics:
>>>>>>>>>>
>>>>>>>>>> /sys/devices/system/cpu/cpuX/cpufreq/energy_performance_preference
>>>>>>>>>>
>>>>>>>>>> It also accepts a few string arguments and converts them to integers.
>>>>>>>>>>
>>>>>>>>>> Perhaps the same name should be used, and the semantics made exactly
>>>>>>>>>> identical, and then it could be documented as present for either
>>>>>>>>>> cppc_cpufreq OR intel_pstate?
>>>>>>>>>>
>>>>>>>>>> I think would be more elegant if userspace tooling could Just Work with
>>>>>>>>>> either driver.
>>>>>>>>>>
>>>>>>>>>> One might object that the frequency selection behavior that results from
>>>>>>>>>> any particular value of the register itself might be different, but they
>>>>>>>>>> are *already* different between Intel's P and E-cores in the same CPU
>>>>>>>>>> package. (Ugh.)
>>>>>>>>> Yes, I should use the same name. Thanks.
>>>>>>>>>
>>>>>>>>> As for accepting string arguments and converting them to integers, I don't
>>>>>>>>> think it is necessary. It'll be a litte confused if someone writes a raw
>>>>>>>>> value and reads a string I think. I prefer to let users freely set this
>>>>>>>>> value.
>>>>>>>>>
>>>>>>>>> In addition, there are many differences between the implementations of
>>>>>>>>> energy_performance_preference in intel_pstate and cppc_cpufreq (and
>>>>>>>>> amd-pstate...). It is really difficult to explain all this differences in
>>>>>>>>> this document. So I'll leave it to be documented as present for
>>>>>>>>> cppc_cpufreq only.
>>>>>>>> At least the interface to userspace I think we should do the best we can to be the same between all the drivers if possible.
>>>>>>>>
>>>>>>>> For example; I've got a patch that I may bring up in a future kernel cycle that adds raw integer writes to amd-pstates energy_performance_profile to behave the same way intel-pstate does.
>>>>>>> I agree that it's better to keep this interface consistent across different
>>>>>>> drivers. But in my opinion, the implementation of intel_pstate
>>>>>>> energy_performance_preference is not really nice. Someone may write a raw
>>>>>>> value but read a string, or read strings for some values and read raw
>>>>>>> values for some other values. It is inconsistent. It may be better to use
>>>>>>> some other implementation, such as seperating the operations of r/w strings
>>>>>>> and raw values into two files.
>>>>>> I agree it would be better to be sure of the type to expect when reading the
>>>>>> energy_performance_preference file. The epp values in the range 0-255 with 0
>>>>>> being the performance value for all interfaces.
>>>>>>
>>>>>> In the current epp strings, it seems there is a big gap between the PERFORMANCE
>>>>>> and the BALANCE_PERFORMANCE strings. Maybe it would be good to complete it:
>>>>>> EPP_PERFORMANCE        0x00
>>>>>> EPP_BALANCE_PERFORMANCE    0x40      // state value changed
>>>>>> EPP_BALANCE        0x80      // new state
>>>>>> EPP_BALANCE_POWERSAVE    0xC0
>>>>>> EPP_POWERSAVE        0xFF
>>>>>>
>>>>>> NIT: The mapping seems to be slightly different for intel_pstate and amd-pstate
>>>>>> currently:
>>>>>> drivers/cpufreq/amd-pstate.c
>>>>>> #define AMD_CPPC_EPP_PERFORMANCE        0x00
>>>>>> #define AMD_CPPC_EPP_BALANCE_PERFORMANCE    0x80
>>>>>> #define AMD_CPPC_EPP_BALANCE_POWERSAVE        0xBF
>>>>>> #define AMD_CPPC_EPP_POWERSAVE            0xFF
>>>>>>
>>>>>> arch/x86/include/asm/msr-index.h
>>>>>> #define HWP_EPP_PERFORMANCE        0x00
>>>>>> #define HWP_EPP_BALANCE_PERFORMANCE    0x80
>>>>>> #define HWP_EPP_BALANCE_POWERSAVE    0xC0   <------ Different from AMD_CPPC_EPP_BALANCE_POWERSAVE
>>>>>> #define HWP_EPP_POWERSAVE        0xFF
>>>>>>
>>>>>>> I think it's better to consult Rafael and Viresh about how this should
>>>>>>> evolve.
>>>>>> Yes indeed
>>>>> Maybe it's best to discuss what the goal of raw EPP number writes is to decide what to do with it.
>>>>>
>>>>> IE in intel-pstate is it for userspace to be able to actually utilize something besides the strings all the time?  Or is it just for debugging to find better values for strings in the future?
>>>>>
>>>>> If the former maybe we're better off splitting to 'energy_performance_preference' and 'energy_performance_preference_int'.
>>>>>
>>>>> If the latter maybe we're better off putting the integer writes and reads into debugfs instead and making 'energy_performance_preference' return -EINVAL while a non-predefined value is in use.
>>>
>>> In Intel case EPP values can be different based on processor. In some case they they end up sharing the same CPU model. So strings are not suitable for all cases. Also there is different preference of EPP between Chrome systems and non chrome distro. For example Chrome has some resource manager which can change and same on Intel distros with LPMD.
>>>
>>
>> Thanks for confirming it is intentional and changing it would break existing userspace.
>>
>> And FWIW even in Windows there are more than 4 situational values used like we have in Linux today.
>>
>> As the status quo is there I personally feel that we should do the exact same for other implementation of energy_performance_preference.
> 
> I still don't think this implementation is nice, for the following reasons:
> 
> 1. Users may write raw value but read string. It's odd.
> 
> 2. Sometimes a raw value is read and sometimes a character string is read.
> The userspace tool needs to adapt this.
> 
> 3. Reading and writing EPP strings is not really general in driver. It is
> more reasonable to use the userspace tool to implement it.
> 
> In order not to break existing userspace, I'll rename the file to
> 'energy_performance_preference_int' or 'energy_performance_preference_val'
> in cppc_cpufreq and only support reading and writing raw value. As for
> accepting string arguments, it's not necessary for cppc_cpufreq for now.
> It's easy to add this feature but hard to remove, so I'll leave it to the
> future if it is really needed.
> 
> As for amd-pstate and intel_pstate, you can decide how
> energy_performance_preference should evolve. But I strongly recommend
> splitting it.
> 
> Regards,
> 
> Lifeng

I agree that not being able to write-read-confirm a numeric value that
happens to match one of the strings is... ugly. I have seen cases of
userspace fighting with firmware for control of the EPP, and detecting
that's happening is difficult if there are *other* reasons you might not
get back what you just wrote.

However, the desktop userspace pretty much only uses the strings anyway,
and they serve to translate arbitrary non-linear hardware-specific
scales into something userspace can build policy on. They are somewhat
less magic than the raw values, although still, IMO, pretty magic.

The raw values don't have consistent interpretation other than that
higher numbers give monotonically increasing efficiency for that
specific core, and I use that wording particularly. Lower numbers may
not increase performance because of long-term power/thermal limits, and
on asymmetric-CPU machines, the same number may not mean the same thing
for different cores in the same CPU package.

Leaving it up to userspace means either you need machine-model-specific
golden images, manually tuned by a skilled administrator, or CPU model
checks and an out-of-tree hardware database that somehow everyone
collaborates on. That database would likely miss a lot of things that
aren't popular X86 CPUs less than 5 years old.

Thanks,

Russell
>>
> 





[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux