+ Rafael [corrected email addr] On 14 August 2014 15:57, Ashwin Chaugule <ashwin.chaugule@xxxxxxxxxx> wrote: > > Hello, > > Apologies in advance for a lengthy cover letter. Hopefully it has all the > required information so you dont need to read the ACPI spec. ;) > > This patchset introduces the ideas behind CPPC (Collaborative Processor > Performance Control) and implements support for controlling CPU performance > using the existing PID (Proportional-Integral-Derivative) controller (from > intel_pstate.c) and some CPPC semantics. > > The patchwork is not a final proposal of the CPPC implementation. I've had > to hack some sections due to lack of hardware, details of which are in the > Testing section. > > There are several bits of information which are needed in order to make CPPC > work great on Linux based platforms and I'm hoping to start a wider discussion on > how to address the missing bits. The following sections briefly introduce CPPC > and later highlight the information which is missing. > > More importantly, I'm also looking for ideas on how to support CPPC in the short > term, given that we will soon be seeing products based on ARM64 and X86 which > support CPPC.[1] Although we may not have all the information, we could make it > work with existing governors in a way this patchset demonstrates. Hopefully, > this approach is acceptable for mainline inclusion in the short term. > > Finer details about the CPPC spec are available in the latest ACPI 5.1 > specification.[2] > > If these issues are being discussed on some other thread or elsewhere, or if > someone is already working on it, please let me know. Also, please correct me if > I have misunderstood anything. > > What is CPPC: > ============= > > CPPC is the new interface for CPU performance control between the OS and the > platform defined in ACPI 5.0+. The interface is built on an abstract > representation of CPU performance rather than raw frequency. Basic operation > consists of: > > * Platform enumerates supported performance range to OS > > * OS requests desired performance level over some time window along > with min and max instantaneous limits > > * Platform is free to optimize power/performance within bounds provided by OS > > * Platform provides telemetry back to OS on delivered performance > > Communication with the OS is abstracted via another ACPI construct called > Platform Communication Channel (PCC) which is essentially a generic shared > memory channel with doorbell interrupts going back and forth. This abstraction > allows the “platform” for CPPC to be a variety of different entities – driver, > firmware, BMC, etc. > > CPPC describes the following registers: > > * HighestPerformance: (read from platform) > > Indicates the highest level of performance the processor is theoretically > capable of achieving, given ideal operating conditions. > > * Nominal Performance: (read from platform) > > Indicates the highest sustained performance level of the processor. This is the > highest operating performance level the CPU is expected to deliver continuously. > > * LowestNonlinearPerformance: (read from platform) > > Indicates the lowest performance level of the processor with non- linear power > savings. > > * LowestPerformance: (read from platform) > > Indicates the lowest performance level of the processor. > > * GuaranteedPerformanceRegister: (read from platform) > > Optional. If supported, contains register to read the current guaranteed > performance from. This is current max sustained performance of the CPU taking > into account all budgeting constraints. This can change at runtime and is > notified to the OS via ACPI notification mechanisms. > > * DesiredPerformanceRegister: (write to platform) > > Register to write desired performance level from the OS. > > * MinimumPerformanceRegister: (write to platform) > > Optional. This is the min allowable performance as requested by the OS. > > * MaximumPerformanceRegister: (write to platform) > > Optional. This is the max allowable performance as requested by the OS. > > * PerformanceReductionToleranceRegister (write to platform) > > Optional. This is the deviation below the desired perf value as requested by the > OS. If the Time window register(below) is supported, then this value is the min > performance on average over the time window that the OS desires. > > * TimeWindowRegister: (write to platform) > Optional. The OS requests desired performance over this time window. > > * CounterWraparoundTime: (read from platform) > Optional. Min time before the performance counters wrap around. > > * ReferencePerformanceCounterRegister: (read from platform) > > A counter that increments proportionally to the reference performance of the > processor. > > * DeliveredPerformanceCounterRegister: (read from platform) > > Delivered perf = reference perf * delta(delivered perf ctr)/delta(ref perf ctr) > > * PerformanceLimitedRegister: (read from platform) > > This is set by the platform in the event that it has to limit available > performance due to thermal or budgeting constraints. > > * CPPCEnableRegister: (read/write from platform) > > Enable/disable CPPC > > * AutonomousSelectionEnable: > > Platform decides CPU performance level w/o OS assist. > > * AutonomousActivityWindowRegister: > > This influences the increase or decrease in cpu performance of the platforms > autonomous selection policy. > > * EnergyPerformancePreferenceRegister: > > Provides a energy or perf bias hint to the platform when in autonomous mode. > > * Reference Performance: (read from platform) > > Indicates the rate at which the reference counter increments. > > > Whats missing in CPPC: > ===================== > > Currently CPPC makes no mention of power. However, this could be added in future > versions of the spec. > e.g. although CPPC works off of a continuous range of CPU perf levels, we could > discretize the scale such that we only extract points where the power level changes > substantially between CPU perf levels and export this information to the > scheduler. > > Whats missing in the kernel: > ============================ > > We may have some of this information in the scheduler, but I couldn't see a good way > to extract it for CPPC yet. > > (1) An intelligent way to provide a min/max bound and a desired value for CPU > performance. > > (2) A timing window for the platform to deliver requested performance within > bounds. This could be a kind of sampling interval between consecutive reads of > delivered cpu performance. > > (3) Centralized decision making by any CPU in a freq domain for all its > siblings. > > The last point needs some elaboration: > > I see that the CPUfreq layer allows defining "related CPUs" and that we can have > the same policy for CPUs in the same freq domain and one governor per policy. > However, from what I could tell, there are at least 2 baked in assumptions in > this layer which break things at least for platforms like ARM (Please correct me > if I'm wrong!) > > (a) All CPUs run at the exact same max, min and cur freq. > > (b) Any CPU always gets exactly the freq it asked for. > > So, although the CPUFreq layer is capable of making somewhat centralized cpufreq > decisions for CPUs under the same policy, it seems to be deciding things under > the wrong/inapplicable assumptions. Moreover only one CPU is in charge of > policy handling at a time and the policy handling is shifted to another CPU in the > domain, only if the former CPU is hotplugged out. > > Not having a proper centralized decision maker adversely affects power saving > possibilities in platforms that can't distinguish when a CPU requests a specific > freq and then goes to sleep. This potentially has the effect of keeping other > CPUs in the domain running at a much higher frequency than required, while the > initial requester is deep asleep. > > So, for point (3), I'm not sure which path we should take among the following: > > (I) Fix cpufreq layer and add CPPC support as a cpufreq_driver. (a) Change > every call to get freq to make it read h/w registers and then snap value back to > freq table. This way, cpufreq can keep its idea of freq current. However, this > may end up waking CPUs to read counters, unless they are mem mapped. (b) Allow > any CPU in the "related_cpus" mask to make policy decisions on behalf of > siblings. So the policy maker switching is not tied to hotplug. > > (II) Not touch CPUfreq and use the PID algorithm instead, but change the busyness > calculation to accumulate busyness values from all CPUs in common domain. > Requires implementation of domain awareness. > > (III) Address these issues in the upcoming CPUfreq/CPUidle integration layer(?) > > (IV) Handle it in the platform or lose out. I understand this has some potential > for adding latency to cpu freq requests so it may not be possible for all > platforms. > > (V) ..? > > For points (1) and (2), the long term solution IMHO is to work it out along with the > scheduler CPUFreq/CPUidle integration. But its not clear to me what would be > the best short term approach. I'd greatly appreciate any suggestions/comments. > If anyone is already working on these issues, please CC me as well. > > Test setup: > ========== > > For the sake of experiments, I used the Thinkpad x240 laptop, which advertises > CPPC tables in its ACPI firmware. The PCC and CPPC drivers included in this > patchset are able to parse the tables and get all the required addresses. > However, it seems that this laptop doesn't implement PCC doorbell and the > firmware side of CPPC. The PCC doorbell calls would just wait forever. Not sure > whats going on there. So, I had to hack it and emulate what the platform > would've done to some extent. > > I extracted the PID algo from intel_pstate.c and modified it with CPPC function > wrappers. It shouldn't be hard to replace PID with anything else we think is > suitable. In the long term, I hope we can make CPPC calls directly from the > scheduler. > > There are two versions of the low level CPPC accessors. The one included in the > patchset is how I'd imagine it would work with platforms that completely > implement CPPC in firmware. > > The other version is here [5]. This should help with DT or platforms with broken > firmware, enablement purposes etc. > > I ran a simple kernel compilation with intel_pstate.c and the CPPC modified > version as the governors and saw no real difference in compile times. So no new > overheads added. > I verified that CPU freq requests were taken by reading out the PERF_STATUS register. > > [1] - See the HWP section 14.4 http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf > [2] - http://www.uefi.org/sites/default/files/resources/ACPI_5_1release.pdf > [3] - https://plus.google.com/+TheodoreTso/posts/2vEekAsG2QT > [4] - https://plus.google.com/+ArjanvandeVen/posts/dLn9T4ehywL > [5] - http://git.linaro.org/people/ashwin.chaugule/leg-kernel.git/blob/236d901d31fb06fda798880c9ca09d65123c5dd9:/drivers/cpufreq/cppc_x86.c > > Ashwin Chaugule (3): > ACPI: Add support for Platform Communication Channel > CPPC: Add support for Collaborative Processor Performance Control > CPPC: Add ACPI accessors to CPC registers > > drivers/acpi/Kconfig | 10 + > drivers/acpi/Makefile | 1 + > drivers/acpi/pcc.c | 301 +++++++++++++++ > drivers/cpufreq/Kconfig | 19 + > drivers/cpufreq/Makefile | 2 + > drivers/cpufreq/cppc.c | 874 ++++++++++++++++++++++++++++++++++++++++++++ > drivers/cpufreq/cppc.h | 181 +++++++++ > drivers/cpufreq/cppc_acpi.c | 80 ++++ > 8 files changed, 1468 insertions(+) > create mode 100644 drivers/acpi/pcc.c > create mode 100644 drivers/cpufreq/cppc.c > create mode 100644 drivers/cpufreq/cppc.h > create mode 100644 drivers/cpufreq/cppc_acpi.c > > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html