Hi Al, On 7/14/2016 11:57 AM, Al Stone wrote: > On 07/14/2016 11:39 AM, Prakash, Prashanth wrote: >> >> On 7/14/2016 10:15 AM, Al Stone wrote: >>> On 07/14/2016 04:03 AM, Alexey Klimov wrote: >>>> Hi Al, >>>> >>>> On Tue, Jul 12, 2016 at 11:16:11AM -0600, Al Stone wrote: >>>>> When CPPC is being used by ACPI on arm64, user space tools such as >>>>> cpupower report CPU frequency values from sysfs that are incorrect. >>>>> >>>>> What the driver was doing was reporting the values given by ACPI tables >>>>> in whatever scale was used to provide them. However, the ACPI spec >>>>> defines the CPPC values as unitless abstract numbers. Internal kernel >>>>> structures such as struct perf_cap, in contrast, expect these values >>>>> to be in KHz. When these struct values get reported via sysfs, the >>>>> user space tools also assume they are in KHz, causing them to report >>>>> incorrect values (for example, reporting a CPU frequency of 1MHz when >>>>> it should be 1.8GHz). >>>>> >>>>> While the investigation for a long term fix proceeds (several options >>>>> are being explored, some of which may require spec changes or other >>>>> much more invasive fixes), this patch forces the values read by CPPC >>>>> to be read in KHz, regardless of what they actually represent. >>>>> >>>>> The downside is that this approach has some assumptions: >>>>> >>>>> (1) It relies on SMBIOS3 being used, *and* that the Max Frequency >>>>> value for a processor is set to a non-zero value. >>>>> >>>>> (2) It assumes that all processors run at the same speed, or that >>>>> the CPPC values have all been scaled to reflect relative speed. >>>>> This patch retrieves the largest CPU Max Frequency from a type 4 DMI >>>>> record that it can find. This may not be an issue, however, as a >>>>> sampling of DMI data on x86 and arm64 indicates there is often only >>>>> one such record regardless. Since CPPC is relatively new, it is >>>>> unclear if the ACPI ASL will always be written to reflect any sort >>>>> of relative performance of processors of differing speeds. >>>>> >>>>> (3) It assumes that performance and frequency both scale linearly. >>>>> >>>>> For arm64 servers, this may be sufficient, but it does rely on >>>>> firmware values being set correctly. Hence, other approaches are >>>>> also being considered. >>>>> >>>>> This has been tested on three arm64 servers, with and without DMI, with >>>>> and without CPPC support. >>>>> >>>>> Changes for v4: >>>>> -- Replaced magic constants with #defines (Rafael Wysocki) >>>>> -- Renamed cppc_unitless_to_khz() to cppc_to_khz() (Rafael Wysocki) >>>>> -- Replaced hidden initialization with a clearer form (Rafael Wysocki) >>>>> -- Instead of picking up the first Max Speed value from DMI, we will >>>>> now get the largest Max Speed; still an approximation, but slightly >>>>> less subject to error (Rafael Wysocki) >>>>> -- Kconfig for cppc_cpufreq now depends on DMI, instead of selecting >>>>> it, in order to make sure DMI is set up properly (Rafael Wysocki) >>>>> >>>>> Changes for v3: >>>>> -- Added clarifying commentary re short-term vs long-term fix (Alexey >>>>> Klimov) >>>>> -- Added range checking code to ensure proper arithmetic occurs, >>>>> especially no division by zero (Alexey Klimov) >>>>> >>>>> Changes for v2: >>>>> -- Corrected thinko: needed to have DEPENDS on DMI in Kconfig.arm, >>>>> not SELECT DMI (found by build daemon) >>>>> >>>>> Signed-off-by: Al Stone <ahs3@xxxxxxxxxx> >>>>> --- >>>>> drivers/acpi/cppc_acpi.c | 106 +++++++++++++++++++++++++++++++++++++++++--- >>>>> drivers/cpufreq/Kconfig.arm | 2 +- >>>>> 2 files changed, 102 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c >>>>> index 8adac69..6e6df9c 100644 >>>>> --- a/drivers/acpi/cppc_acpi.c >>>>> +++ b/drivers/acpi/cppc_acpi.c >>>>> @@ -40,8 +40,18 @@ >>>>> #include <linux/cpufreq.h> >>>>> #include <linux/delay.h> >>>>> #include <linux/ktime.h> >>>>> +#include <linux/dmi.h> >>>>> + >>>>> +#include <asm/unaligned.h> >>>>> >>>>> #include <acpi/cppc_acpi.h> >>>>> + >>>>> +/* Minimum struct length needed for the DMI processor entry we want */ >>>>> +#define DMI_ENTRY_PROCESSOR_MIN_LENGTH 48 >>>>> + >>>>> +/* Offest in the DMI processor structure for the max frequency */ >>>>> +#define DMI_PROCESSOR_MAX_SPEED 0x14 >>>>> + >>>>> /* >>>>> * Lock to provide mutually exclusive access to the PCC >>>>> * channel. e.g. When the remote updates the shared region >>>>> @@ -709,6 +719,56 @@ static int cpc_write(struct cpc_reg *reg, u64 val) >>>>> return ret_val; >>>>> } >>>>> >>>>> +static u64 cppc_dmi_khz; >>>>> + >>>>> +static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private) >>>>> +{ >>>>> + const u8 *dmi_data = (const u8 *)dm; >>>>> + u16 *mhz = (u16 *)private; >>>>> + >>>>> + if (dm->type == DMI_ENTRY_PROCESSOR && >>>>> + dm->length >= DMI_ENTRY_PROCESSOR_MIN_LENGTH) { >>>>> + u16 val = (u16)get_unaligned((const u16 *) >>>>> + (dmi_data + DMI_PROCESSOR_MAX_SPEED)); >>>>> + *mhz = val > *mhz ? val : *mhz; >>>>> + } >>>>> +} >>>>> + >>>>> + >>>>> +static u64 cppc_get_dmi_khz(void) >>>>> +{ >>>>> + u16 mhz = 0; >>>>> + >>>>> + dmi_walk(cppc_find_dmi_mhz, &mhz); >>>>> + >>>>> + /* >>>>> + * Real stupid fallback value, just in case there is no >>>>> + * actual value set. >>>>> + */ >>>>> + mhz = mhz ? mhz : 1; >>>>> + >>>>> + return (1000 * mhz); >>>>> +} >>>>> + >>>>> +static u64 cppc_to_khz(u64 min_in, u64 max_in, u64 val) >>>>> +{ >>>>> + /* >>>>> + * The incoming val should be min <= val <= max. Our >>>>> + * job is to convert that to KHz so it can be properly >>>>> + * reported to user space via cpufreq_policy. >>>>> + */ >>>>> + u64 curval = val; >>>>> + u64 maxf = max_in; >>>>> + u64 minf = min_in; >>>>> + >>>>> + /* range check the input values */ >>>>> + curval = curval < minf ? minf : curval; >>>>> + curval = curval > maxf ? maxf : curval; >>>>> + minf = minf >= maxf ? maxf - 1 : minf; >>>> In the pedantic world kernel should warn in dmesg about nominal value that is >>>> out of range. Or min being larger than max. >>>> Not really an issue but for debugging purposes.. >>> Fair enough. I had some pr_warns/pr_info in there before while >>> I was debugging but pulled them out; it seemed noisy at the time. >>> >>>>> + return ((curval - minf) * cppc_dmi_khz) / (maxf - minf); >>>>> +} >>>>> + >>>>> /** >>>>> * cppc_get_perf_caps - Get a CPUs performance capabilities. >>>>> * @cpunum: CPU from which to get capabilities info. >>>>> @@ -748,17 +808,53 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps) >>>>> } >>>>> } >>>>> >>>>> - cpc_read(&highest_reg->cpc_entry.reg, &high); >>>>> - perf_caps->highest_perf = high; >>>>> + /* >>>>> + * Since these values in perf_caps will be used in setting >>>>> + * up the cpufreq policy, they must always be stored in units >>>>> + * of KHz. If they are not, user space tools will become very >>>>> + * confused since they assume these are in KHz when reading >>>>> + * sysfs. >>>>> + * >>>>> + * NB: there may be better approaches to this problem that, as >>>>> + * of this writing, are still being explored. Ideally, this is >>>>> + * a short term solution since correlating CPPC abstract values >>>>> + * with CPU frequency may or may not reflect actual performance. >>>>> + * >>>>> + * The reason longer term solutions are being explored is because >>>>> + * this solution requires we make the following assumptions: >>>>> + * >>>>> + * (1) It relies on SMBIOS3 being used, *and* that the Max >>>>> + * Frequency value for a processor is set to a non-zero value. >>>>> + * >>>>> + * (2) It assumes that all processors run at the same speed, or >>>>> + * that the CPPC values have all been scaled to reflect any >>>>> + * relative differences. This code retrieves the largest CPU >>>>> + * Max Frequency from a type 4 DMI record that it can find. >>>>> + * This may not be an issue, however, as a sampling of DMI >>>>> + * data on x86 and arm64 indicates there is often only one >>>>> + * such record regardless. >>>>> + * >>>>> + * (3) It assumes that performance and frequency both scale >>>>> + * linearly. >>>>> + * >>>>> + * None of these are particularly horrible assumptions. But, they >>>>> + * are assumptions and ultimately we'd like to be able to report >>>>> + * performance without quite so many of them. >>>>> + * >>>>> + */ >>>>> + cppc_dmi_khz = cppc_get_dmi_khz(); >>>>> >>>>> + cpc_read(&highest_reg->cpc_entry.reg, &high); >>>>> cpc_read(&lowest_reg->cpc_entry.reg, &low); >>>>> - perf_caps->lowest_perf = low; >>>>> + >>>>> + perf_caps->highest_perf = cppc_to_khz(low, high, high); >>>>> + perf_caps->lowest_perf = cppc_to_khz(low, high, low); >>>> Just to check. Do I understand correctly that cpufreq subsystem is populated >>>> with this converted values (policy->min and max), then cpufreq sends request to >>>> set new target_freq in converted units to CPPC that in its turn is not aware >>>> about convertation or do i miss something? >>>> There should be convertation back to abstract scale for cppc to correctly >>>> understand and handle request to set new desired performance, shouldn't it? >>> I'll go check again to be sure I didn't miss something, but my understanding >>> is that the CPPC abstract scale that was provided in the ACPI tables would be >>> translated to a different range modulo the frequency, with the relationships >>> between min, max and nominal intact, and that the new range would be used for >>> the abstract scale instead. So as far as CPPC and cpufreq are concerned, they >>> would just use the new range for everything -- they just operate on whatever >>> range is provided, and are more concerned about the relationships between min, >>> max and nominal than their actual values. >> When we write our request to the desired perf register, the written value should be >> in the original scale, so we need to convert it from KHz to the same scale that was >> present in ACPI. So we have to do this conversion on all the APIs exposed by cppc acpi >> module >> >> Given the above, it might makes sense to move this logic to cpufreq/cppc_cpufreq.c, >> so that we have a clear boundary on what is the scale being used in each module. >> - ACPI will continue to use to original scale >> - cppc_cpufreq will use the KHz scale as rest of the cpufreq drivers >> >> Thanks, >> Prashanth > Oh, bugger. Thanks, Prashanth. I had spaced that these could be registers, > too, and not just integers, in the ASL. My bad. > > So, yeah, that might make sense. Another approach that might be simpler is to > look at the sysfs read for the various files and just fix the representation > there. I'll take a look at both. I think your current approach of reporting the highest/lowest in KHz to the cpufreq framework is probably much better than fixing at the sysfs interface. One of the items on my todo list is to modify the cpufreq_stats to create a pseudo freq. table and use that to maintain the stats if the cpufreq driver(cppc) doesn't have a built-in freq. table. For things like these fixing at the sysfs interface can get a little ugly, whereas implementing it on top of your current approach would be much cleaner. There are very few interfaces in the cppc_cpufreq driver that would require an update due to this conversion, so it should be simpler compared to sysfs as well. Thanks, Prashanth -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html