On 05/19/2016 07:12 AM, Rafael J. Wysocki wrote: > On Thu, May 19, 2016 at 1:41 AM, Al Stone <ahs3@xxxxxxxxxx> wrote: >> When CPPC is being used by ACPI on arm64, user space tools such as >> cpupower report CPU frequency values from sysfs that are incorrect. >> >> What the driver was doing was reporting the values given by ACPI tables >> in whatever scale was used to provide them. However, the ACPI spec >> defines the CPPC values as unitless abstract numbers. Internal kernel >> structures such as struct perf_cap, in contrast, expect these values >> to be in KHz. When these struct values get reported via sysfs, the >> user space tools also assume they are in KHz, causing them to report >> incorrect values (for example, reporting a CPU frequency of 1MHz when >> it should be 1.8GHz). >> >> While the investigation for a long term fix proceeds (several options >> are being explored, some of which may require spec changes or other >> much more invasive fixes), this patch forces the values read by CPPC >> to be read in KHz, regardless of what they actually represent. >> >> The downside is that this approach has some assumptions: >> >> (1) It relies on SMBIOS3 being used, *and* that the Max Frequency >> value for a processor is set to a non-zero value. >> >> (2) It assumes that all processors run at the same speed, or that >> the CPPC values have all been scaled to reflect relative speed. >> This patch retrieves the first CPU Max Frequency from a type 4 DMI >> record that it can find. This may not be an issue, however, as a >> sampling of DMI data on x86 and arm64 indicates there is often only >> one such record regardless. Since CPPC is relatively new, it is >> unclear if the ACPI ASL will always be written to reflect any sort >> of relative performance of processors of differing speeds. >> >> (3) It assumes that performance and frequency both scale linearly. >> >> For arm64 servers, this may be sufficient, but it does rely on >> firmware values being set correctly. Hence, other approaches are >> also being considered. >> >> This has been tested on three arm64 servers, with and without DMI, with >> and without CPPC support. >> >> Changes for v3: >> -- Added clarifying commentary re short-term vs long-term fix (Alexey >> Klimov) >> -- Added range checking code to ensure proper arithmetic occurs, >> especially no division by zero (Alexey Klimov) >> >> Changes for v2: >> -- Corrected thinko: needed to have DEPENDS on DMI in Kconfig.arm, >> not SELECT DMI (found by build daemon) >> >> Signed-off-by: Al Stone <ahs3@xxxxxxxxxx> >> --- >> drivers/acpi/cppc_acpi.c | 96 ++++++++++++++++++++++++++++++++++++++++++--- >> drivers/cpufreq/Kconfig.arm | 1 + >> 2 files changed, 92 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c >> index 8adac69..56a46e6 100644 >> --- a/drivers/acpi/cppc_acpi.c >> +++ b/drivers/acpi/cppc_acpi.c >> @@ -40,6 +40,9 @@ >> #include <linux/cpufreq.h> >> #include <linux/delay.h> >> #include <linux/ktime.h> >> +#include <linux/dmi.h> >> + >> +#include <asm/unaligned.h> >> >> #include <acpi/cppc_acpi.h> >> /* >> @@ -709,6 +712,55 @@ static int cpc_write(struct cpc_reg *reg, u64 val) >> return ret_val; >> } >> >> +static u64 cppc_dmi_khz; >> + >> +static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private) >> +{ >> + u16 *mhz = (u16 *)private; >> + const u8 *dmi_data = (const u8 *)dm; >> + >> + if (dm->type == DMI_ENTRY_PROCESSOR && dm->length >= 48) >> + *mhz = (u16)get_unaligned((const u16 *)(dmi_data + 0x14)); > > Is the offset standardized across architectures (I can't recall ATM)? > If so, maybe #define a symbol for it and add a comment saying that > next to its definition? It's part of the SMBIOS standard. I'll fix it. I feel very silly -- I just bugged somebody else about magic constants; karma is an amazing thing :). >> +} >> + >> + >> +static u64 cppc_get_dmi_khz(void) >> +{ >> + u16 mhz; >> + >> + dmi_walk(cppc_find_dmi_mhz, &mhz); >> + >> + /* >> + * Real stupid fallback value, just in case there is no >> + * actual value set. >> + */ >> + mhz = mhz ? mhz : 1; >> + >> + return (1000 * mhz); >> +} >> + >> +static u64 cppc_unitless_to_khz(u64 min_in, u64 max_in, u64 val) > > Is the "unitless" part of the name really necessary? Probably not essential, but descriptive. "cppc_convert_to_khz()" instead? Or even "cppc_to_khz()"? > >> +{ >> + /* >> + * The incoming val should be min <= val <= max. Our >> + * job is to convert that to KHz so it can be properly >> + * reported to user space via cpufreq_policy. >> + */ >> + u64 curval = val; >> + u64 maxf = max_in; >> + u64 minf = min_in; >> + >> + if (!cppc_dmi_khz) >> + cppc_dmi_khz = cppc_get_dmi_khz(); > > I don't like hidden initializations like this if they are avoidable > and it very much looks like it is avoidable here. Fair enough. I'll fix it. > Also you seem to be using the same cppc_dmi_khz value for all > processors handled by this driver. Is that really guaranteed to be > correct? Yes and no. It all depends on what's in the SMBIOS tables. Let me see about making the search for that data a bit more robust; oddly enough, I poked at some random machines (x86 and arm64) and more often than not only found one CPU entry in the SMBIOS data. >> + >> + /* range check the input values */ >> + curval = curval < minf ? minf : curval; >> + curval = curval > maxf ? maxf : curval; >> + minf = minf >= maxf ? maxf - 1 : minf; >> + >> + return ((curval - minf) * cppc_dmi_khz) / (maxf - minf); >> +} >> + >> /** >> * cppc_get_perf_caps - Get a CPUs performance capabilities. >> * @cpunum: CPU from which to get capabilities info. >> @@ -748,17 +800,51 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps) >> } >> } >> >> + /* >> + * Since these values in perf_caps will be used in setting >> + * up the cpufreq policy, they must always be stored in units >> + * of KHz. If they are not, user space tools will become very >> + * confused since they assume these are in KHz when reading >> + * sysfs. >> + * >> + * NB: there may be better approaches to this problem that, as >> + * of this writing, are still being explored. Ideally, this is >> + * a short term solution since correlating CPPC abstract values >> + * with CPU frequency may or may not reflect actual performance. >> + * >> + * The reason longer term solutions are being explored is because >> + * this solution requires we make the following assumptions: >> + * >> + * (1) It relies on SMBIOS3 being used, *and* that the Max >> + * Frequency value for a processor is set to a non-zero value. >> + * >> + * (2) It assumes that all processors run at the same speed, or >> + * that the CPPC values have all been scaled to reflect any >> + * relative differences. This code retrieves the first CPU >> + * Max Frequency from a type 4 DMI record that it can find. >> + * This may not be an issue, however, as a sampling of DMI >> + * data on x86 and arm64 indicates there is often only one >> + * such record regardless. >> + * >> + * (3) It assumes that performance and frequency both scale >> + * linearly. >> + * >> + * None of these are particularly horrible assumptions. But, they >> + * are assumptions and ultimately we'd like to be able to report >> + * performance without quite so many of them. >> + * >> + */ >> cpc_read(&highest_reg->cpc_entry.reg, &high); >> - perf_caps->highest_perf = high; >> - >> cpc_read(&lowest_reg->cpc_entry.reg, &low); >> - perf_caps->lowest_perf = low; >> + >> + perf_caps->highest_perf = cppc_unitless_to_khz(low, high, high); >> + perf_caps->lowest_perf = cppc_unitless_to_khz(low, high, low); >> >> cpc_read(&ref_perf->cpc_entry.reg, &ref); >> - perf_caps->reference_perf = ref; >> + perf_caps->reference_perf = cppc_unitless_to_khz(low, high, ref); >> >> cpc_read(&nom_perf->cpc_entry.reg, &nom); >> - perf_caps->nominal_perf = nom; >> + perf_caps->nominal_perf = cppc_unitless_to_khz(low, high, nom); >> >> if (!ref) >> perf_caps->reference_perf = perf_caps->nominal_perf; >> diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm >> index 14b1f93..0573982 100644 >> --- a/drivers/cpufreq/Kconfig.arm >> +++ b/drivers/cpufreq/Kconfig.arm >> @@ -255,6 +255,7 @@ config ACPI_CPPC_CPUFREQ >> tristate "CPUFreq driver based on the ACPI CPPC spec" >> depends on ACPI >> select ACPI_CPPC_LIB >> + select DMI > > What if there are unmet dependencies for DMI? Or is that not possible? > I'll double check this. I don't recall any, off hand. >> default n >> help >> This adds a CPUFreq driver which uses CPPC methods >> -- Thanks for the feedback, Rafael! -- ciao, al ----------------------------------- Al Stone Software Engineer Red Hat, Inc. ahs3@xxxxxxxxxx ----------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html