Re: [PATCH v4] Force cppc_cpufreq to report values in KHz to fix user space reporting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Al,



On 7/14/2016 11:57 AM, Al Stone wrote:
> On 07/14/2016 11:39 AM, Prakash, Prashanth wrote:
>>
>> On 7/14/2016 10:15 AM, Al Stone wrote:
>>> On 07/14/2016 04:03 AM, Alexey Klimov wrote:
>>>> Hi Al,
>>>>
>>>> On Tue, Jul 12, 2016 at 11:16:11AM -0600, Al Stone wrote:
>>>>> When CPPC is being used by ACPI on arm64, user space tools such as
>>>>> cpupower report CPU frequency values from sysfs that are incorrect.
>>>>>
>>>>> What the driver was doing was reporting the values given by ACPI tables
>>>>> in whatever scale was used to provide them.  However, the ACPI spec
>>>>> defines the CPPC values as unitless abstract numbers.  Internal kernel
>>>>> structures such as struct perf_cap, in contrast, expect these values
>>>>> to be in KHz.  When these struct values get reported via sysfs, the
>>>>> user space tools also assume they are in KHz, causing them to report
>>>>> incorrect values (for example, reporting a CPU frequency of 1MHz when
>>>>> it should be 1.8GHz).
>>>>>
>>>>> While the investigation for a long term fix proceeds (several options
>>>>> are being explored, some of which may require spec changes or other
>>>>> much more invasive fixes), this patch forces the values read by CPPC
>>>>> to be read in KHz, regardless of what they actually represent.
>>>>>
>>>>> The downside is that this approach has some assumptions:
>>>>>
>>>>>    (1) It relies on SMBIOS3 being used, *and* that the Max Frequency
>>>>>    value for a processor is set to a non-zero value.
>>>>>
>>>>>    (2) It assumes that all processors run at the same speed, or that
>>>>>    the CPPC values have all been scaled to reflect relative speed.
>>>>>    This patch retrieves the largest CPU Max Frequency from a type 4 DMI
>>>>>    record that it can find.  This may not be an issue, however, as a
>>>>>    sampling of DMI data on x86 and arm64 indicates there is often only
>>>>>    one such record regardless.  Since CPPC is relatively new, it is
>>>>>    unclear if the ACPI ASL will always be written to reflect any sort
>>>>>    of relative performance of processors of differing speeds.
>>>>>
>>>>>    (3) It assumes that performance and frequency both scale linearly.
>>>>>
>>>>> For arm64 servers, this may be sufficient, but it does rely on
>>>>> firmware values being set correctly.  Hence, other approaches are
>>>>> also being considered.
>>>>>
>>>>> This has been tested on three arm64 servers, with and without DMI, with
>>>>> and without CPPC support.
>>>>>
>>>>> Changes for v4:
>>>>>     -- Replaced magic constants with #defines (Rafael Wysocki)
>>>>>     -- Renamed cppc_unitless_to_khz() to cppc_to_khz() (Rafael Wysocki)
>>>>>     -- Replaced hidden initialization with a clearer form (Rafael Wysocki)
>>>>>     -- Instead of picking up the first Max Speed value from DMI, we will
>>>>>        now get the largest Max Speed; still an approximation, but slightly
>>>>>        less subject to error (Rafael Wysocki)
>>>>>     -- Kconfig for cppc_cpufreq now depends on DMI, instead of selecting
>>>>>        it, in order to make sure DMI is set up properly (Rafael Wysocki)
>>>>>
>>>>> Changes for v3:
>>>>>     -- Added clarifying commentary re short-term vs long-term fix (Alexey
>>>>>        Klimov)
>>>>>     -- Added range checking code to ensure proper arithmetic occurs,
>>>>>        especially no division by zero (Alexey Klimov)
>>>>>
>>>>> Changes for v2:
>>>>>     -- Corrected thinko: needed to have DEPENDS on DMI in Kconfig.arm,
>>>>>        not SELECT DMI (found by build daemon)
>>>>>
>>>>> Signed-off-by: Al Stone <ahs3@xxxxxxxxxx>
>>>>> ---
>>>>>  drivers/acpi/cppc_acpi.c    | 106 +++++++++++++++++++++++++++++++++++++++++---
>>>>>  drivers/cpufreq/Kconfig.arm |   2 +-
>>>>>  2 files changed, 102 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
>>>>> index 8adac69..6e6df9c 100644
>>>>> --- a/drivers/acpi/cppc_acpi.c
>>>>> +++ b/drivers/acpi/cppc_acpi.c
>>>>> @@ -40,8 +40,18 @@
>>>>>  #include <linux/cpufreq.h>
>>>>>  #include <linux/delay.h>
>>>>>  #include <linux/ktime.h>
>>>>> +#include <linux/dmi.h>
>>>>> +
>>>>> +#include <asm/unaligned.h>
>>>>>  
>>>>>  #include <acpi/cppc_acpi.h>
>>>>> +
>>>>> +/* Minimum struct length needed for the DMI processor entry we want */
>>>>> +#define DMI_ENTRY_PROCESSOR_MIN_LENGTH	48
>>>>> +
>>>>> +/* Offest in the DMI processor structure for the max frequency */
>>>>> +#define DMI_PROCESSOR_MAX_SPEED  0x14
>>>>> +
>>>>>  /*
>>>>>   * Lock to provide mutually exclusive access to the PCC
>>>>>   * channel. e.g. When the remote updates the shared region
>>>>> @@ -709,6 +719,56 @@ static int cpc_write(struct cpc_reg *reg, u64 val)
>>>>>  	return ret_val;
>>>>>  }
>>>>>  
>>>>> +static u64 cppc_dmi_khz;
>>>>> +
>>>>> +static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private)
>>>>> +{
>>>>> +	const u8 *dmi_data = (const u8 *)dm;
>>>>> +	u16 *mhz = (u16 *)private;
>>>>> +
>>>>> +	if (dm->type == DMI_ENTRY_PROCESSOR &&
>>>>> +	    dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) {
>>>>> +		u16 val = (u16)get_unaligned((const u16 *)
>>>>> +				(dmi_data + DMI_PROCESSOR_MAX_SPEED));
>>>>> +		*mhz = val > *mhz ? val : *mhz;
>>>>> +	}
>>>>> +}
>>>>> +
>>>>> +
>>>>> +static u64 cppc_get_dmi_khz(void)
>>>>> +{
>>>>> +	u16 mhz = 0;
>>>>> +
>>>>> +	dmi_walk(cppc_find_dmi_mhz, &mhz);
>>>>> +
>>>>> +	/*
>>>>> +	 * Real stupid fallback value, just in case there is no
>>>>> +	 * actual value set.
>>>>> +	 */
>>>>> +	mhz = mhz ? mhz : 1;
>>>>> +
>>>>> +	return (1000 * mhz);
>>>>> +}
>>>>> +
>>>>> +static u64 cppc_to_khz(u64 min_in, u64 max_in, u64 val)
>>>>> +{
>>>>> +	/*
>>>>> +	 * The incoming val should be min <= val <= max.  Our
>>>>> +	 * job is to convert that to KHz so it can be properly
>>>>> +	 * reported to user space via cpufreq_policy.
>>>>> +	 */
>>>>> +	u64 curval = val;
>>>>> +	u64 maxf = max_in;
>>>>> +	u64 minf = min_in;
>>>>> +
>>>>> +	/* range check the input values */
>>>>> +	curval = curval < minf ? minf : curval;
>>>>> +	curval = curval > maxf ? maxf : curval;
>>>>> +	minf = minf >= maxf ? maxf - 1 : minf;
>>>> In the pedantic world kernel should warn in dmesg about nominal value that is
>>>> out of range. Or min being larger than max.
>>>> Not really an issue but for debugging purposes..
>>> Fair enough.  I had some pr_warns/pr_info in there before while
>>> I was debugging but pulled them out; it seemed noisy at the time.
>>>
>>>>> +	return ((curval - minf) * cppc_dmi_khz) / (maxf - minf);
>>>>> +}
>>>>> +
>>>>>  /**
>>>>>   * cppc_get_perf_caps - Get a CPUs performance capabilities.
>>>>>   * @cpunum: CPU from which to get capabilities info.
>>>>> @@ -748,17 +808,53 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
>>>>>  		}
>>>>>  	}
>>>>>  
>>>>> -	cpc_read(&highest_reg->cpc_entry.reg, &high);
>>>>> -	perf_caps->highest_perf = high;
>>>>> +	/*
>>>>> +	 * Since these values in perf_caps will be used in setting
>>>>> +	 * up the cpufreq policy, they must always be stored in units
>>>>> +	 * of KHz.  If they are not, user space tools will become very
>>>>> +	 * confused since they assume these are in KHz when reading
>>>>> +	 * sysfs.
>>>>> +	 *
>>>>> +	 * NB: there may be better approaches to this problem that, as
>>>>> +	 * of this writing, are still being explored.  Ideally, this is
>>>>> +	 * a short term solution since correlating CPPC abstract values
>>>>> +	 * with CPU frequency may or may not reflect actual performance.
>>>>> +	 *
>>>>> +	 * The reason longer term solutions are being explored is because
>>>>> +	 * this solution requires we make the following assumptions:
>>>>> +	 *
>>>>> +	 *    (1) It relies on SMBIOS3 being used, *and* that the Max
>>>>> +	 *        Frequency value for a processor is set to a non-zero value.
>>>>> +	 *
>>>>> +	 *    (2) It assumes that all processors run at the same speed, or
>>>>> +	 *        that the CPPC values have all been scaled to reflect any
>>>>> +	 *        relative differences.  This code retrieves the largest CPU
>>>>> +	 *        Max Frequency from a type 4 DMI record that it can find.
>>>>> +	 *        This may not be an issue, however, as a sampling of DMI
>>>>> +	 *        data on x86 and arm64 indicates there is often only one
>>>>> +	 *        such record regardless.
>>>>> +	 *
>>>>> +	 *    (3) It assumes that performance and frequency both scale
>>>>> +	 *        linearly.
>>>>> +	 *
>>>>> +	 * None of these are particularly horrible assumptions.  But, they
>>>>> +	 * are assumptions and ultimately we'd like to be able to report
>>>>> +	 * performance without quite so many of them.
>>>>> +	 *
>>>>> +	 */
>>>>> +	cppc_dmi_khz = cppc_get_dmi_khz();
>>>>>  
>>>>> +	cpc_read(&highest_reg->cpc_entry.reg, &high);
>>>>>  	cpc_read(&lowest_reg->cpc_entry.reg, &low);
>>>>> -	perf_caps->lowest_perf = low;
>>>>> +
>>>>> +	perf_caps->highest_perf = cppc_to_khz(low, high, high);
>>>>> +	perf_caps->lowest_perf = cppc_to_khz(low, high, low);
>>>> Just to check. Do I understand correctly that cpufreq subsystem is populated
>>>> with this converted values (policy->min and max), then cpufreq sends request to
>>>> set new target_freq in converted units to CPPC that in its turn is not aware
>>>> about convertation or do i miss something?
>>>> There should be convertation back to abstract scale for cppc to correctly
>>>> understand and handle request to set new desired performance, shouldn't it?
>>> I'll go check again to be sure I didn't miss something, but my understanding
>>> is that the CPPC abstract scale that was provided in the ACPI tables would be
>>> translated to a different range modulo the frequency, with the relationships
>>> between min, max and nominal intact, and that the new range would be used for
>>> the abstract scale instead.  So as far as CPPC and cpufreq are concerned, they
>>> would just use the new range for everything -- they just operate on whatever
>>> range is provided, and are more concerned about the relationships between min,
>>> max and nominal than their actual values.
>> When we write our request to the desired perf register, the written value should be
>> in the original scale, so we need to convert it from KHz to the same scale that was
>> present in ACPI. So we have to do this conversion on all the APIs exposed by cppc acpi
>> module
>>
>> Given the above, it might makes sense to move this logic to cpufreq/cppc_cpufreq.c,
>> so that we have a clear boundary on what is the scale being used in each module.
>> - ACPI will continue to use to original scale
>> - cppc_cpufreq will use the KHz scale as rest of the cpufreq drivers
>>
>> Thanks,
>> Prashanth
> Oh, bugger.  Thanks, Prashanth.  I had spaced that these could be registers,
> too, and not just integers, in the ASL.  My bad.
>
> So, yeah, that might make sense.  Another approach that might be simpler is to
> look at the sysfs read for the various files and just fix the representation
> there.  I'll take a look at both.
I think your current approach of reporting the highest/lowest in KHz to the cpufreq
framework is probably much better than fixing at the sysfs interface.

One of the items on my todo list is to modify the cpufreq_stats to create a pseudo freq.
table and use that to maintain the stats if the cpufreq driver(cppc) doesn't have a built-in
freq. table. For things like these fixing at the sysfs interface can get a little ugly, whereas
implementing it on top of your current approach would be much cleaner.

There are very few interfaces in the cppc_cpufreq driver that would require an update
due to this conversion, so it should be simpler compared to sysfs as well.

Thanks,
Prashanth
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux