+linux-doc (sorry for omitting it in the first place) On Thursday, March 09, 2017 04:28:32 PM Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > The user/admin documentation of cpufreq is badly outdated. It > conains stale and/or inaccurate information along with things > that are not particularly useful. Also, some of the important > pieces are missing from it. > > For this reason, add a new user/admin document for cpufreq > containing current information to admin-guide and drop the old > outdated .txt documents it is replacing. > > Since there will be more PM documents in admin-guide going forward, > create a separate directory for them and put the cpufreq document > in there right away. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > Acked-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx> > --- > > Hi Jon, > > This hasn't changed since it was sent last time as an RFC > (https://patchwork.kernel.org/patch/9583783/) and it has not received any > comments since then too, so from my perspective it is good to go. > > Please apply. > > Thanks, > Rafael > > --- > Documentation/admin-guide/index.rst | 1 > Documentation/admin-guide/pm/cpufreq.rst | 700 +++++++++++++++++++++++++++++++ > Documentation/admin-guide/pm/index.rst | 15 > Documentation/cpu-freq/boost.txt | 93 ---- > Documentation/cpu-freq/governors.txt | 301 ------------- > Documentation/cpu-freq/index.txt | 7 > Documentation/cpu-freq/user-guide.txt | 226 ---------- > 7 files changed, 716 insertions(+), 627 deletions(-) > > Index: linux-pm/Documentation/admin-guide/pm/cpufreq.rst > =================================================================== > --- /dev/null > +++ linux-pm/Documentation/admin-guide/pm/cpufreq.rst > @@ -0,0 +1,700 @@ > +.. |struct cpufreq_policy| replace:: :c:type:`struct cpufreq_policy <cpufreq_policy>` > + > +======================= > +CPU Performance Scaling > +======================= > + > +:: > + > + Copyright (c) 2017 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > + > +The Concept of CPU Performance Scaling > +====================================== > + > +The majority of modern processors are capable of operating in a number of > +different clock frequency and voltage configurations, often referred to as > +Operating Performance Points or P-states (in ACPI terminology). As a rule, > +the higher the clock frequency and the higher the voltage, the more instructions > +can be retired by the CPU over a unit of time, but also the higher the clock > +frequency and the higher the voltage, the more energy is consumed over a unit of > +time (or the more power is drawn) by the CPU in the given P-state. Therefore > +there is a natural tradeoff between the CPU capacity (the number of instructions > +that can be executed over a unit of time) and the power drawn by the CPU. > + > +In some situations it is desirable or even necessary to run the program as fast > +as possible and then there is no reason to use any P-states different from the > +highest one (i.e. the highest-performance frequency/voltage configuration > +available). In some other cases, however, it may not be necessary to execute > +instructions so quickly and maintaining the highest available CPU capacity for a > +relatively long time without utilizing it entirely may be regarded as wasteful. > +It also may not be physically possible to maintain maximum CPU capacity for too > +long for thermal or power supply capacity reasons or similar. To cover those > +cases, there are hardware interfaces allowing CPUs to be switched between > +different frequency/voltage configurations or (in the ACPI terminology) to be > +put into different P-states. > + > +Typically, they are used along with algorithms to estimate the required CPU > +capacity, so as to decide which P-states to put the CPUs into. Of course, since > +the utilization of the system generally changes over time, that has to be done > +repeatedly on a regular basis. The activity by which this happens is referred > +to as CPU performance scaling or CPU frequency scaling (because it involves > +adjusting the CPU clock frequency). > + > + > +CPU Performance Scaling in Linux > +================================ > + > +The Linux kernel supports CPU performance scaling by means of the ``CPUFreq`` > +(CPU Frequency scaling) subsystem that consists of three layers of code: the > +core, scaling governors and scaling drivers. > + > +The ``CPUFreq`` core provides the common code infrastructure and user space > +interfaces for all platforms that support CPU performance scaling. It defines > +the basic framework in which the other components operate. > + > +Scaling governors implement algorithms to estimate the required CPU capacity. > +As a rule, each governor implements one, possibly parametrized, scaling > +algorithm. > + > +Scaling drivers talk to the hardware. They provide scaling governors with > +information on the available P-states (or P-state ranges in some cases) and > +access platform-specific hardware interfaces to change CPU P-states as requested > +by scaling governors. > + > +In principle, all available scaling governors can be used with every scaling > +driver. That design is based on the observation that the information used by > +performance scaling algorithms for P-state selection can be represented in a > +platform-independent form in the majority of cases, so it should be possible > +to use the same performance scaling algorithm implemented in exactly the same > +way regardless of which scaling driver is used. Consequently, the same set of > +scaling governors should be suitable for every supported platform. > + > +However, that observation may not hold for performance scaling algorithms > +based on information provided by the hardware itself, for example through > +feedback registers, as that information is typically specific to the hardware > +interface it comes from and may not be easily represented in an abstract, > +platform-independent way. For this reason, ``CPUFreq`` allows scaling drivers > +to bypass the governor layer and implement their own performance scaling > +algorithms. That is done by the ``intel_pstate`` scaling driver. > + > + > +``CPUFreq`` Policy Objects > +========================== > + > +In some cases the hardware interface for P-state control is shared by multiple > +CPUs. That is, for example, the same register (or set of registers) is used to > +control the P-state of multiple CPUs at the same time and writing to it affects > +all of those CPUs simultaneously. > + > +Sets of CPUs sharing hardware P-state control interfaces are represented by > +``CPUFreq`` as |struct cpufreq_policy| objects. For consistency, > +|struct cpufreq_policy| is also used when there is only one CPU in the given > +set. > + > +The ``CPUFreq`` core maintains a pointer to a |struct cpufreq_policy| object for > +every CPU in the system, including CPUs that are currently offline. If multiple > +CPUs share the same hardware P-state control interface, all of the pointers > +corresponding to them point to the same |struct cpufreq_policy| object. > + > +``CPUFreq`` uses |struct cpufreq_policy| as its basic data type and the design > +of its user space interface is based on the policy concept. > + > + > +CPU Initialization > +================== > + > +First of all, a scaling driver has to be registered for ``CPUFreq`` to work. > +It is only possible to register one scaling driver at a time, so the scaling > +driver is expected to be able to handle all CPUs in the system. > + > +The scaling driver may be registered before or after CPU registration. If > +CPUs are registered earlier, the driver core invokes the ``CPUFreq`` core to > +take a note of all of the already registered CPUs during the registration of the > +scaling driver. In turn, if any CPUs are registered after the registration of > +the scaling driver, the ``CPUFreq`` core will be invoked to take note of them > +at their registration time. > + > +In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it > +has not seen so far as soon as it is ready to handle that CPU. [Note that the > +logical CPU may be a physical single-core processor, or a single core in a > +multicore processor, or a hardware thread in a physical processor or processor > +core. In what follows "CPU" always means "logical CPU" unless explicitly stated > +otherwise and the word "processor" is used to refer to the physical part > +possibly including multiple logical CPUs.] > + > +Once invoked, the ``CPUFreq`` core checks if the policy pointer is already set > +for the given CPU and if so, it skips the policy object creation. Otherwise, > +a new policy object is created and initialized, which involves the creation of > +a new policy directory in ``sysfs``, and the policy pointer corresponding to > +the given CPU is set to the new policy object's address in memory. > + > +Next, the scaling driver's ``->init()`` callback is invoked with the policy > +pointer of the new CPU passed to it as the argument. That callback is expected > +to initialize the performance scaling hardware interface for the given CPU (or, > +more precisely, for the set of CPUs sharing the hardware interface it belongs > +to, represented by its policy object) and, if the policy object it has been > +called for is new, to set parameters of the policy, like the minimum and maximum > +frequencies supported by the hardware, the table of available frequencies (if > +the set of supported P-states is not a continuous range), and the mask of CPUs > +that belong to the same policy (including both online and offline CPUs). That > +mask is then used by the core to populate the policy pointers for all of the > +CPUs in it. > + > +The next major initialization step for a new policy object is to attach a > +scaling governor to it (to begin with, that is the default scaling governor > +determined by the kernel configuration, but it may be changed later > +via ``sysfs``). First, a pointer to the new policy object is passed to the > +governor's ``->init()`` callback which is expected to initialize all of the > +data structures necessary to handle the given policy and, possibly, to add > +a governor ``sysfs`` interface to it. Next, the governor is started by > +invoking its ``->start()`` callback. > + > +That callback it expected to register per-CPU utilization update callbacks for > +all of the online CPUs belonging to the given policy with the CPU scheduler. > +The utilization update callbacks will be invoked by the CPU scheduler on > +important events, like task enqueue and dequeue, on every iteration of the > +scheduler tick or generally whenever the CPU utilization may change (from the > +scheduler's perspective). They are expected to carry out computations needed > +to determine the P-state to use for the given policy going forward and to > +invoke the scaling driver to make changes to the hardware in accordance with > +the P-state selection. The scaling driver may be invoked directly from > +scheduler context or asynchronously, via a kernel thread or workqueue, depending > +on the configuration and capabilities of the scaling driver and the governor. > + > +Similar steps are taken for policy objects that are not new, but were "inactive" > +previously, meaning that all of the CPUs belonging to them were offline. The > +only practical difference in that case is that the ``CPUFreq`` core will attempt > +to use the scaling governor previously used with the policy that became > +"inactive" (and is re-initialized now) instead of the default governor. > + > +In turn, if a previously offline CPU is being brought back online, but some > +other CPUs sharing the policy object with it are online already, there is no > +need to re-initialize the policy object at all. In that case, it only is > +necessary to restart the scaling governor so that it can take the new online CPU > +into account. That is achieved by invoking the governor's ``->stop`` and > +``->start()`` callbacks, in this order, for the entire policy. > + > +As mentioned before, the ``intel_pstate`` scaling driver bypasses the scaling > +governor layer of ``CPUFreq`` and provides its own P-state selection algorithms. > +Consequently, if ``intel_pstate`` is used, scaling governors are not attached to > +new policy objects. Instead, the driver's ``->setpolicy()`` callback is invoked > +to register per-CPU utilization update callbacks for each policy. These > +callbacks are invoked by the CPU scheduler in the same way as for scaling > +governors, but in the ``intel_pstate`` case they both determine the P-state to > +use and change the hardware configuration accordingly in one go from scheduler > +context. > + > +The policy objects created during CPU initialization and other data structures > +associated with them are torn down when the scaling driver is unregistered > +(which happens when the kernel module containing it is unloaded, for example) or > +when the last CPU belonging to the given policy in unregistered. > + > + > +Policy Interface in ``sysfs`` > +============================= > + > +During the initialization of the kernel, the ``CPUFreq`` core creates a > +``sysfs`` directory (kobject) called ``cpufreq`` under > +:file:`/sys/devices/system/cpu/`. > + > +That directory contains a ``policyX`` subdirectory (where ``X`` represents an > +integer number) for every policy object maintained by the ``CPUFreq`` core. > +Each ``policyX`` directory is pointed to by ``cpufreq`` symbolic links > +under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer > +that may be different from the one represented by ``X``) for all of the CPUs > +associated with (or belonging to) the given policy. The ``policyX`` directories > +in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific > +attributes (files) to control ``CPUFreq`` behavior for the corresponding policy > +objects (that is, for all of the CPUs associated with them). > + > +Some of those attributes are generic. They are created by the ``CPUFreq`` core > +and their behavior generally does not depend on what scaling driver is in use > +and what scaling governor is attached to the given policy. Some scaling drivers > +also add driver-specific attributes to the policy directories in ``sysfs`` to > +control policy-specific aspects of driver behavior. > + > +The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/` > +are the following: > + > +``affected_cpus`` > + List of online CPUs belonging to this policy (i.e. sharing the hardware > + performance scaling interface represented by the ``policyX`` policy > + object). > + > +``bios_limit`` > + If the platform firmware (BIOS) tells the OS to apply an upper limit to > + CPU frequencies, that limit will be reported through this attribute (if > + present). > + > + The existence of the limit may be a result of some (often unintentional) > + BIOS settings, restrictions coming from a service processor or another > + BIOS/HW-based mechanisms. > + > + This does not cover ACPI thermal limitations which can be discovered > + through a generic thermal driver. > + > + This attribute is not present if the scaling driver in use does not > + support it. > + > +``cpuinfo_max_freq`` > + Maximum possible operating frequency the CPUs belonging to this policy > + can run at (in kHz). > + > +``cpuinfo_min_freq`` > + Minimum possible operating frequency the CPUs belonging to this policy > + can run at (in kHz). > + > +``cpuinfo_transition_latency`` > + The time it takes to switch the CPUs belonging to this policy from one > + P-state to another, in nanoseconds. > + > + If unknown or if known to be so high that the scaling driver does not > + work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`) > + will be returned by reads from this attribute. > + > +``related_cpus`` > + List of all (online and offline) CPUs belonging to this policy. > + > +``scaling_available_governors`` > + List of ``CPUFreq`` scaling governors present in the kernel that can > + be attached to this policy or (if the ``intel_pstate`` scaling driver is > + in use) list of scaling algorithms provided by the driver that can be > + applied to this policy. > + > + [Note that some governors are modular and it may be necessary to load a > + kernel module for the governor held by it to become available and be > + listed by this attribute.] > + > +``scaling_cur_freq`` > + Current frequency of all of the CPUs belonging to this policy (in kHz). > + > + For the majority of scaling drivers, this is the frequency of the last > + P-state requested by the driver from the hardware using the scaling > + interface provided by it, which may or may not reflect the frequency > + the CPU is actually running at (due to hardware design and other > + limitations). > + > + Some scaling drivers (e.g. ``intel_pstate``) attempt to provide > + information more precisely reflecting the current CPU frequency through > + this attribute, but that still may not be the exact current CPU > + frequency as seen by the hardware at the moment. > + > +``scaling_driver`` > + The scaling driver currently in use. > + > +``scaling_governor`` > + The scaling governor currently attached to this policy or (if the > + ``intel_pstate`` scaling driver is in use) the scaling algorithm > + provided by the driver that is currently applied to this policy. > + > + This attribute is read-write and writing to it will cause a new scaling > + governor to be attached to this policy or a new scaling algorithm > + provided by the scaling driver to be applied to it (in the > + ``intel_pstate`` case), as indicated by the string written to this > + attribute (which must be one of the names listed by the > + ``scaling_available_governors`` attribute described above). > + > +``scaling_max_freq`` > + Maximum frequency the CPUs belonging to this policy are allowed to be > + running at (in kHz). > + > + This attribute is read-write and writing a string representing an > + integer to it will cause a new limit to be set (it must not be lower > + than the value of the ``scaling_min_freq`` attribute). > + > +``scaling_min_freq`` > + Minimum frequency the CPUs belonging to this policy are allowed to be > + running at (in kHz). > + > + This attribute is read-write and writing a string representing a > + non-negative integer to it will cause a new limit to be set (it must not > + be higher than the value of the ``scaling_max_freq`` attribute). > + > +``scaling_setspeed`` > + This attribute is functional only if the `userspace`_ scaling governor > + is attached to the given policy. > + > + It returns the last frequency requested by the governor (in kHz) or can > + be written to in order to set a new frequency for the policy. > + > + > +Generic Scaling Governors > +========================= > + > +``CPUFreq`` provides generic scaling governors that can be used with all > +scaling drivers. As stated before, each of them implements a single, possibly > +parametrized, performance scaling algorithm. > + > +Scaling governors are attached to policy objects and different policy objects > +can be handled by different scaling governors at the same time (although that > +may lead to suboptimal results in some cases). > + > +The scaling governor for a given policy object can be changed at any time with > +the help of the ``scaling_governor`` policy attribute in ``sysfs``. > + > +Some governors expose ``sysfs`` attributes to control or fine-tune the scaling > +algorithms implemented by them. Those attributes, referred to as governor > +tunables, can be either global (system-wide) or per-policy, depending on the > +scaling driver in use. If the driver requires governor tunables to be > +per-policy, they are located in a subdirectory of each policy directory. > +Otherwise, they are located in a subdirectory under > +:file:`/sys/devices/system/cpu/cpufreq/`. In either case the name of the > +subdirectory containing the governor tunables is the name of the governor > +providing them. > + > +``performance`` > +--------------- > + > +When attached to a policy object, this governor causes the highest frequency, > +within the ``scaling_max_freq`` policy limit, to be requested for that policy. > + > +The request is made once at that time the governor for the policy is set to > +``performance`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq`` > +policy limits change after that. > + > +``powersave`` > +------------- > + > +When attached to a policy object, this governor causes the lowest frequency, > +within the ``scaling_min_freq`` policy limit, to be requested for that policy. > + > +The request is made once at that time the governor for the policy is set to > +``powersave`` and whenever the ``scaling_max_freq`` or ``scaling_min_freq`` > +policy limits change after that. > + > +``userspace`` > +------------- > + > +This governor does not do anything by itself. Instead, it allows user space > +to set the CPU frequency for the policy it is attached to by writing to the > +``scaling_setspeed`` attribute of that policy. > + > +``schedutil`` > +------------- > + > +This governor uses CPU utilization data available from the CPU scheduler. It > +generally is regarded as a part of the CPU scheduler, so it can access the > +scheduler's internal data structures directly. > + > +It runs entirely in scheduler context, although in some cases it may need to > +invoke the scaling driver asynchronously when it decides that the CPU frequency > +should be changed for a given policy (that depends on whether or not the driver > +is capable of changing the CPU frequency from scheduler context). > + > +The actions of this governor for a particular CPU depend on the scheduling class > +invoking its utilization update callback for that CPU. If it is invoked by the > +RT or deadline scheduling classes, the governor will increase the frequency to > +the allowed maximum (that is, the ``scaling_max_freq`` policy limit). In turn, > +if it is invoked by the CFS scheduling class, the governor will use the > +Per-Entity Load Tracking (PELT) metric for the root control group of the > +given CPU as the CPU utilization estimate (see the `Per-entity load tracking`_ > +LWN.net article for a description of the PELT mechanism). Then, the new > +CPU frequency to apply is computed in accordance with the formula > + > + f = 1.25 * ``f_0`` * ``util`` / ``max`` > + > +where ``util`` is the PELT number, ``max`` is the theoretical maximum of > +``util``, and ``f_0`` is either the maximum possible CPU frequency for the given > +policy (if the PELT number is frequency-invariant), or the current CPU frequency > +(otherwise). > + > +This governor also employs a mechanism allowing it to temporarily bump up the > +CPU frequency for tasks that have been waiting on I/O most recently, called > +"IO-wait boosting". That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag > +is passed by the scheduler to the governor callback which causes the frequency > +to go up to the allowed maximum immediately and then draw back to the value > +returned by the above formula over time. > + > +This governor exposes only one tunable: > + > +``rate_limit_us`` > + Minimum time (in microseconds) that has to pass between two consecutive > + runs of governor computations (default: 1000 times the scaling driver's > + transition latency). > + > + The purpose of this tunable is to reduce the scheduler context overhead > + of the governor which might be excessive without it. > + > +This governor generally is regarded as a replacement for the older `ondemand`_ > +and `conservative`_ governors (described below), as it is simpler and more > +tightly integrated with the CPU scheduler, its overhead in terms of CPU context > +switches and similar is less significant, and it uses the scheduler's own CPU > +utilization metric, so in principle its decisions should not contradict the > +decisions made by the other parts of the scheduler. > + > +``ondemand`` > +------------ > + > +This governor uses CPU load as a CPU frequency selection metric. > + > +In order to estimate the current CPU load, it measures the time elapsed between > +consecutive invocations of its worker routine and computes the fraction of that > +time in which the given CPU was not idle. The ratio of the non-idle (active) > +time to the total CPU time is taken as an estimate of the load. > + > +If this governor is attached to a policy shared by multiple CPUs, the load is > +estimated for all of them and the greatest result is taken as the load estimate > +for the entire policy. > + > +The worker routine of this governor has to run in process context, so it is > +invoked asynchronously (via a workqueue) and CPU P-states are updated from > +there if necessary. As a result, the scheduler context overhead from this > +governor is minimum, but it causes additional CPU context switches to happen > +relatively often and the CPU P-state updates triggered by it can be relatively > +irregular. Also, it affects its own CPU load metric by running code that > +reduces the CPU idle time (even though the CPU idle time is only reduced very > +slightly by it). > + > +It generally selects CPU frequencies proportional to the estimated load, so that > +the value of the ``cpuinfo_max_freq`` policy attribute corresponds to the load of > +1 (or 100%), and the value of the ``cpuinfo_min_freq`` policy attribute > +corresponds to the load of 0, unless when the load exceeds a (configurable) > +speedup threshold, in which case it will go straight for the highest frequency > +it is allowed to use (the ``scaling_max_freq`` policy limit). > + > +This governor exposes the following tunables: > + > +``sampling_rate`` > + This is how often the governor's worker routine should run, in > + microseconds. > + > + Typically, it is set to values of the order of 10000 (10 ms). Its > + default value is equal to the value of ``cpuinfo_transition_latency`` > + for each policy this governor is attached to (but since the unit here > + is greater by 1000, this means that the time represented by > + ``sampling_rate`` is 1000 times greater than the transition latency by > + default). > + > + If this tunable is per-policy, the following shell command sets the time > + represented by it to be 750 times as high as the transition latency:: > + > + # echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate > + > + > +``min_sampling_rate`` > + The minimum value of ``sampling_rate``. > + > + Equal to 10000 (10 ms) if :c:macro:`CONFIG_NO_HZ_COMMON` and > + :c:data:`tick_nohz_active` are both set or to 20 times the value of > + :c:data:`jiffies` in microseconds otherwise. > + > +``up_threshold`` > + If the estimated CPU load is above this value (in percent), the governor > + will set the frequency to the maximum value allowed for the policy. > + Otherwise, the selected frequency will be proportional to the estimated > + CPU load. > + > +``ignore_nice_load`` > + If set to 1 (default 0), it will cause the CPU load estimation code to > + treat the CPU time spent on executing tasks with "nice" levels greater > + than 0 as CPU idle time. > + > + This may be useful if there are tasks in the system that should not be > + taken into account when deciding what frequency to run the CPUs at. > + Then, to make that happen it is sufficient to increase the "nice" level > + of those tasks above 0 and set this attribute to 1. > + > +``sampling_down_factor`` > + Temporary multiplier, between 1 (default) and 100 inclusive, to apply to > + the ``sampling_rate`` value if the CPU load goes above ``up_threshold``. > + > + This causes the next execution of the governor's worker routine (after > + setting the frequency to the allowed maximum) to be delayed, so the > + frequency stays at the maximum level for a longer time. > + > + Frequency fluctuations in some bursty workloads may be avoided this way > + at the cost of additional energy spent on maintaining the maximum CPU > + capacity. > + > +``powersave_bias`` > + Reduction factor to apply to the original frequency target of the > + governor (including the maximum value used when the ``up_threshold`` > + value is exceeded by the estimated CPU load) or sensitivity threshold > + for the AMD frequency sensitivity powersave bias driver > + (:file:`drivers/cpufreq/amd_freq_sensitivity.c`), between 0 and 1000 > + inclusive. > + > + If the AMD frequency sensitivity powersave bias driver is not loaded, > + the effective frequency to apply is given by > + > + f * (1 - ``powersave_bias`` / 1000) > + > + where f is the governor's original frequency target. The default value > + of this attribute is 0 in that case. > + > + If the AMD frequency sensitivity powersave bias driver is loaded, the > + value of this attribute is 400 by default and it is used in a different > + way. > + > + On Family 16h (and later) AMD processors there is a mechanism to get a > + measured workload sensitivity, between 0 and 100% inclusive, from the > + hardware. That value can be used to estimate how the performance of the > + workload running on a CPU will change in response to frequency changes. > + > + The performance of a workload with the sensitivity of 0 (memory-bound or > + IO-bound) is not expected to increase at all as a result of increasing > + the CPU frequency, whereas workloads with the sensitivity of 100% > + (CPU-bound) are expected to perform much better if the CPU frequency is > + increased. > + > + If the workload sensitivity is less than the threshold represented by > + the ``powersave_bias`` value, the sensitivity powersave bias driver > + will cause the governor to select a frequency lower than its original > + target, so as to avoid over-provisioning workloads that will not benefit > + from running at higher CPU frequencies. > + > +``conservative`` > +---------------- > + > +This governor uses CPU load as a CPU frequency selection metric. > + > +It estimates the CPU load in the same way as the `ondemand`_ governor described > +above, but the CPU frequency selection algorithm implemented by it is different. > + > +Namely, it avoids changing the frequency significantly over short time intervals > +which may not be suitable for systems with limited power supply capacity (e.g. > +battery-powered). To achieve that, it changes the frequency in relatively > +small steps, one step at a time, up or down - depending on whether or not a > +(configurable) threshold has been exceeded by the estimated CPU load. > + > +This governor exposes the following tunables: > + > +``freq_step`` > + Frequency step in percent of the maximum frequency the governor is > + allowed to set (the ``scaling_max_freq`` policy limit), between 0 and > + 100 (5 by default). > + > + This is how much the frequency is allowed to change in one go. Setting > + it to 0 will cause the default frequency step (5 percent) to be used > + and setting it to 100 effectively causes the governor to periodically > + switch the frequency between the ``scaling_min_freq`` and > + ``scaling_max_freq`` policy limits. > + > +``down_threshold`` > + Threshold value (in percent, 20 by default) used to determine the > + frequency change direction. > + > + If the estimated CPU load is greater than this value, the frequency will > + go up (by ``freq_step``). If the load is less than this value (and the > + ``sampling_down_factor`` mechanism is not in effect), the frequency will > + go down. Otherwise, the frequency will not be changed. > + > +``sampling_down_factor`` > + Frequency decrease deferral factor, between 1 (default) and 10 > + inclusive. > + > + It effectively causes the frequency to go down ``sampling_down_factor`` > + times slower than it ramps up. > + > + > +Frequency Boost Support > +======================= > + > +Background > +---------- > + > +Some processors support a mechanism to raise the operating frequency of some > +cores in a multicore package temporarily (and above the sustainable frequency > +threshold for the whole package) under certain conditions, for example if the > +whole chip is not fully utilized and below its intended thermal or power budget. > + > +Different names are used by different vendors to refer to this functionality. > +For Intel processors it is referred to as "Turbo Boost", AMD calls it > +"Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on. > +As a rule, it also is implemented differently by different vendors. The simple > +term "frequency boost" is used here for brevity to refer to all of those > +implementations. > + > +The frequency boost mechanism may be either hardware-based or software-based. > +If it is hardware-based (e.g. on x86), the decision to trigger the boosting is > +made by the hardware (although in general it requires the hardware to be put > +into a special state in which it can control the CPU frequency within certain > +limits). If it is software-based (e.g. on ARM), the scaling driver decides > +whether or not to trigger boosting and when to do that. > + > +The ``boost`` File in ``sysfs`` > +------------------------------- > + > +This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls > +the "boost" setting for the whole system. It is not present if the underlying > +scaling driver does not support the frequency boost mechanism (or supports it, > +but provides a driver-specific interface for controlling it, like > +``intel_pstate``). > + > +If the value in this file is 1, the frequency boost mechanism is enabled. This > +means that either the hardware can be put into states in which it is able to > +trigger boosting (in the hardware-based case), or the software is allowed to > +trigger boosting (in the software-based case). It does not mean that boosting > +is actually in use at the moment on any CPUs in the system. It only means a > +permission to use the frequency boost mechanism (which still may never be used > +for other reasons). > + > +If the value in this file is 0, the frequency boost mechanism is disabled and > +cannot be used at all. > + > +The only values that can be written to this file are 0 and 1. > + > +Rationale for Boost Control Knob > +-------------------------------- > + > +The frequency boost mechanism is generally intended to help to achieve optimum > +CPU performance on time scales below software resolution (e.g. below the > +scheduler tick interval) and it is demonstrably suitable for many workloads, but > +it may lead to problems in certain situations. > + > +For this reason, many systems make it possible to disable the frequency boost > +mechanism in the platform firmware (BIOS) setup, but that requires the system to > +be restarted for the setting to be adjusted as desired, which may not be > +practical at least in some cases. For example: > + > + 1. Boosting means overclocking the processor, although under controlled > + conditions. Generally, the processor's energy consumption increases > + as a result of increasing its frequency and voltage, even temporarily. > + That may not be desirable on systems that switch to power sources of > + limited capacity, such as batteries, so the ability to disable the boost > + mechanism while the system is running may help there (but that depends on > + the workload too). > + > + 2. In some situations deterministic behavior is more important than > + performance or energy consumption (or both) and the ability to disable > + boosting while the system is running may be useful then. > + > + 3. To examine the impact of the frequency boost mechanism itself, it is useful > + to be able to run tests with and without boosting, preferably without > + restarting the system in the meantime. > + > + 4. Reproducible results are important when running benchmarks. Since > + the boosting functionality depends on the load of the whole package, > + single-thread performance may vary because of it which may lead to > + unreproducible results sometimes. That can be avoided by disabling the > + frequency boost mechanism before running benchmarks sensitive to that > + issue. > + > +Legacy AMD ``cpb`` Knob > +----------------------- > + > +The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to > +the global ``boost`` one. It is used for disabling/enabling the "Core > +Performance Boost" feature of some AMD processors. > + > +If present, that knob is located in every ``CPUFreq`` policy directory in > +``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called > +``cpb``, which indicates a more fine grained control interface. The actual > +implementation, however, works on the system-wide basis and setting that knob > +for one policy causes the same value of it to be set for all of the other > +policies at the same time. > + > +That knob is still supported on AMD processors that support its underlying > +hardware feature, but it may be configured out of the kernel (via the > +:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option) and the global > +``boost`` knob is present regardless. Thus it is always possible use the > +``boost`` knob instead of the ``cpb`` one which is highly recommended, as that > +is more consistent with what all of the other systems do (and the ``cpb`` knob > +may not be supported any more in the future). > + > +The ``cpb`` knob is never present for any processors without the underlying > +hardware feature (e.g. all Intel ones), even if the > +:c:macro:`CONFIG_X86_ACPI_CPUFREQ_CPB` configuration option is set. > + > + > +.. _Per-entity load tracking: https://lwn.net/Articles/531853/ > Index: linux-pm/Documentation/admin-guide/pm/index.rst > =================================================================== > --- /dev/null > +++ linux-pm/Documentation/admin-guide/pm/index.rst > @@ -0,0 +1,15 @@ > +================ > +Power Management > +================ > + > +.. toctree:: > + :maxdepth: 2 > + > + cpufreq > + > +.. only:: subproject and html > + > + Indices > + ======= > + > + * :ref:`genindex` > Index: linux-pm/Documentation/admin-guide/index.rst > =================================================================== > --- linux-pm.orig/Documentation/admin-guide/index.rst > +++ linux-pm/Documentation/admin-guide/index.rst > @@ -60,6 +60,7 @@ configure specific aspects of kernel beh > mono > java > ras > + pm/index > > .. only:: subproject and html > > Index: linux-pm/Documentation/cpu-freq/boost.txt > =================================================================== > --- linux-pm.orig/Documentation/cpu-freq/boost.txt > +++ /dev/null > @@ -1,93 +0,0 @@ > -Processor boosting control > - > - - information for users - > - > -Quick guide for the impatient: > --------------------- > -/sys/devices/system/cpu/cpufreq/boost > -controls the boost setting for the whole system. You can read and write > -that file with either "0" (boosting disabled) or "1" (boosting allowed). > -Reading or writing 1 does not mean that the system is boosting at this > -very moment, but only that the CPU _may_ raise the frequency at it's > -discretion. > --------------------- > - > -Introduction > -------------- > -Some CPUs support a functionality to raise the operating frequency of > -some cores in a multi-core package if certain conditions apply, mostly > -if the whole chip is not fully utilized and below it's intended thermal > -budget. The decision about boost disable/enable is made either at hardware > -(e.g. x86) or software (e.g ARM). > -On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core", > -in technical documentation "Core performance boost". In Linux we use > -the term "boost" for convenience. > - > -Rationale for disable switch > ----------------------------- > - > -Though the idea is to just give better performance without any user > -intervention, sometimes the need arises to disable this functionality. > -Most systems offer a switch in the (BIOS) firmware to disable the > -functionality at all, but a more fine-grained and dynamic control would > -be desirable: > -1. While running benchmarks, reproducible results are important. Since > - the boosting functionality depends on the load of the whole package, > - single thread performance can vary. By explicitly disabling the boost > - functionality at least for the benchmark's run-time the system will run > - at a fixed frequency and results are reproducible again. > -2. To examine the impact of the boosting functionality it is helpful > - to do tests with and without boosting. > -3. Boosting means overclocking the processor, though under controlled > - conditions. By raising the frequency and the voltage the processor > - will consume more power than without the boosting, which may be > - undesirable for instance for mobile users. Disabling boosting may > - save power here, though this depends on the workload. > - > - > -User controlled switch > ----------------------- > - > -To allow the user to toggle the boosting functionality, the cpufreq core > -driver exports a sysfs knob to enable or disable it. There is a file: > -/sys/devices/system/cpu/cpufreq/boost > -which can either read "0" (boosting disabled) or "1" (boosting enabled). > -The file is exported only when cpufreq driver supports boosting. > -Explicitly changing the permissions and writing to that file anyway will > -return EINVAL. > - > -On supported CPUs one can write either a "0" or a "1" into this file. > -This will either disable the boost functionality on all cores in the > -whole system (0) or will allow the software or hardware to boost at will > -(1). > - > -Writing a "1" does not explicitly boost the system, but just allows the > -CPU to boost at their discretion. Some implementations take external > -factors like the chip's temperature into account, so boosting once does > -not necessarily mean that it will occur every time even using the exact > -same software setup. > - > - > -AMD legacy cpb switch > ---------------------- > -The AMD powernow-k8 driver used to support a very similar switch to > -disable or enable the "Core Performance Boost" feature of some AMD CPUs. > -This switch was instantiated in each CPU's cpufreq directory > -(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb". > -Though the per CPU existence hints at a more fine grained control, the > -actual implementation only supported a system-global switch semantics, > -which was simply reflected into each CPU's file. Writing a 0 or 1 into it > -would pull the other CPUs to the same state. > -For compatibility reasons this file and its behavior is still supported > -on AMD CPUs, though it is now protected by a config switch > -(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created, > -even with the config option set. > -This functionality is considered legacy and will be removed in some future > -kernel version. > - > -More fine grained boosting control > ----------------------------------- > - > -Technically it is possible to switch the boosting functionality at least > -on a per package basis, for some CPUs even per core. Currently the driver > -does not support it, but this may be implemented in the future. > Index: linux-pm/Documentation/cpu-freq/governors.txt > =================================================================== > --- linux-pm.orig/Documentation/cpu-freq/governors.txt > +++ /dev/null > @@ -1,301 +0,0 @@ > - CPU frequency and voltage scaling code in the Linux(TM) kernel > - > - > - L i n u x C P U F r e q > - > - C P U F r e q G o v e r n o r s > - > - - information for users and developers - > - > - > - Dominik Brodowski <linux@xxxxxxxx> > - some additions and corrections by Nico Golde <nico@xxxxxxxxx> > - Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > - Viresh Kumar <viresh.kumar@xxxxxxxxxx> > - > - > - > - Clock scaling allows you to change the clock speed of the CPUs on the > - fly. This is a nice method to save battery power, because the lower > - the clock speed, the less power the CPU consumes. > - > - > -Contents: > ---------- > -1. What is a CPUFreq Governor? > - > -2. Governors In the Linux Kernel > -2.1 Performance > -2.2 Powersave > -2.3 Userspace > -2.4 Ondemand > -2.5 Conservative > -2.6 Schedutil > - > -3. The Governor Interface in the CPUfreq Core > - > -4. References > - > - > -1. What Is A CPUFreq Governor? > -============================== > - > -Most cpufreq drivers (except the intel_pstate and longrun) or even most > -cpu frequency scaling algorithms only allow the CPU frequency to be set > -to predefined fixed values. In order to offer dynamic frequency > -scaling, the cpufreq core must be able to tell these drivers of a > -"target frequency". So these specific drivers will be transformed to > -offer a "->target/target_index/fast_switch()" call instead of the > -"->setpolicy()" call. For set_policy drivers, all stays the same, > -though. > - > -How to decide what frequency within the CPUfreq policy should be used? > -That's done using "cpufreq governors". > - > -Basically, it's the following flow graph: > - > -CPU can be set to switch independently | CPU can only be set > - within specific "limits" | to specific frequencies > - > - "CPUfreq policy" > - consists of frequency limits (policy->{min,max}) > - and CPUfreq governor to be used > - / \ > - / \ > - / the cpufreq governor decides > - / (dynamically or statically) > - / what target_freq to set within > - / the limits of policy->{min,max} > - / \ > - / \ > - Using the ->setpolicy call, Using the ->target/target_index/fast_switch call, > - the limits and the the frequency closest > - "policy" is set. to target_freq is set. > - It is assured that it > - is within policy->{min,max} > - > - > -2. Governors In the Linux Kernel > -================================ > - > -2.1 Performance > ---------------- > - > -The CPUfreq governor "performance" sets the CPU statically to the > -highest frequency within the borders of scaling_min_freq and > -scaling_max_freq. > - > - > -2.2 Powersave > -------------- > - > -The CPUfreq governor "powersave" sets the CPU statically to the > -lowest frequency within the borders of scaling_min_freq and > -scaling_max_freq. > - > - > -2.3 Userspace > -------------- > - > -The CPUfreq governor "userspace" allows the user, or any userspace > -program running with UID "root", to set the CPU to a specific frequency > -by making a sysfs file "scaling_setspeed" available in the CPU-device > -directory. > - > - > -2.4 Ondemand > ------------- > - > -The CPUfreq governor "ondemand" sets the CPU frequency depending on the > -current system load. Load estimation is triggered by the scheduler > -through the update_util_data->func hook; when triggered, cpufreq checks > -the CPU-usage statistics over the last period and the governor sets the > -CPU accordingly. The CPU must have the capability to switch the > -frequency very quickly. > - > -Sysfs files: > - > -* sampling_rate: > - > - Measured in uS (10^-6 seconds), this is how often you want the kernel > - to look at the CPU usage and to make decisions on what to do about the > - frequency. Typically this is set to values of around '10000' or more. > - It's default value is (cmp. with users-guide.txt): transition_latency > - * 1000. Be aware that transition latency is in ns and sampling_rate > - is in us, so you get the same sysfs value by default. Sampling rate > - should always get adjusted considering the transition latency to set > - the sampling rate 750 times as high as the transition latency in the > - bash (as said, 1000 is default), do: > - > - $ echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate > - > -* sampling_rate_min: > - > - The sampling rate is limited by the HW transition latency: > - transition_latency * 100 > - > - Or by kernel restrictions: > - - If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. > - - If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is > - used, the limits depend on the CONFIG_HZ option: > - HZ=1000: min=20000us (20ms) > - HZ=250: min=80000us (80ms) > - HZ=100: min=200000us (200ms) > - > - The highest value of kernel and HW latency restrictions is shown and > - used as the minimum sampling rate. > - > -* up_threshold: > - > - This defines what the average CPU usage between the samplings of > - 'sampling_rate' needs to be for the kernel to make a decision on > - whether it should increase the frequency. For example when it is set > - to its default value of '95' it means that between the checking > - intervals the CPU needs to be on average more than 95% in use to then > - decide that the CPU frequency needs to be increased. > - > -* ignore_nice_load: > - > - This parameter takes a value of '0' or '1'. When set to '0' (its > - default), all processes are counted towards the 'cpu utilisation' > - value. When set to '1', the processes that are run with a 'nice' > - value will not count (and thus be ignored) in the overall usage > - calculation. This is useful if you are running a CPU intensive > - calculation on your laptop that you do not care how long it takes to > - complete as you can 'nice' it and prevent it from taking part in the > - deciding process of whether to increase your CPU frequency. > - > -* sampling_down_factor: > - > - This parameter controls the rate at which the kernel makes a decision > - on when to decrease the frequency while running at top speed. When set > - to 1 (the default) decisions to reevaluate load are made at the same > - interval regardless of current clock speed. But when set to greater > - than 1 (e.g. 100) it acts as a multiplier for the scheduling interval > - for reevaluating load when the CPU is at its top speed due to high > - load. This improves performance by reducing the overhead of load > - evaluation and helping the CPU stay at its top speed when truly busy, > - rather than shifting back and forth in speed. This tunable has no > - effect on behavior at lower speeds/lower CPU loads. > - > -* powersave_bias: > - > - This parameter takes a value between 0 to 1000. It defines the > - percentage (times 10) value of the target frequency that will be > - shaved off of the target. For example, when set to 100 -- 10%, when > - ondemand governor would have targeted 1000 MHz, it will target > - 1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0 > - (disabled) by default. > - > - When AMD frequency sensitivity powersave bias driver -- > - drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter > - defines the workload frequency sensitivity threshold in which a lower > - frequency is chosen instead of ondemand governor's original target. > - The frequency sensitivity is a hardware reported (on AMD Family 16h > - Processors and above) value between 0 to 100% that tells software how > - the performance of the workload running on a CPU will change when > - frequency changes. A workload with sensitivity of 0% (memory/IO-bound) > - will not perform any better on higher core frequency, whereas a > - workload with sensitivity of 100% (CPU-bound) will perform better > - higher the frequency. When the driver is loaded, this is set to 400 by > - default -- for CPUs running workloads with sensitivity value below > - 40%, a lower frequency is chosen. Unloading the driver or writing 0 > - will disable this feature. > - > - > -2.5 Conservative > ----------------- > - > -The CPUfreq governor "conservative", much like the "ondemand" > -governor, sets the CPU frequency depending on the current usage. It > -differs in behaviour in that it gracefully increases and decreases the > -CPU speed rather than jumping to max speed the moment there is any load > -on the CPU. This behaviour is more suitable in a battery powered > -environment. The governor is tweaked in the same manner as the > -"ondemand" governor through sysfs with the addition of: > - > -* freq_step: > - > - This describes what percentage steps the cpu freq should be increased > - and decreased smoothly by. By default the cpu frequency will increase > - in 5% chunks of your maximum cpu frequency. You can change this value > - to anywhere between 0 and 100 where '0' will effectively lock your CPU > - at a speed regardless of its load whilst '100' will, in theory, make > - it behave identically to the "ondemand" governor. > - > -* down_threshold: > - > - Same as the 'up_threshold' found for the "ondemand" governor but for > - the opposite direction. For example when set to its default value of > - '20' it means that if the CPU usage needs to be below 20% between > - samples to have the frequency decreased. > - > -* sampling_down_factor: > - > - Similar functionality as in "ondemand" governor. But in > - "conservative", it controls the rate at which the kernel makes a > - decision on when to decrease the frequency while running in any speed. > - Load for frequency increase is still evaluated every sampling rate. > - > - > -2.6 Schedutil > -------------- > - > -The "schedutil" governor aims at better integration with the Linux > -kernel scheduler. Load estimation is achieved through the scheduler's > -Per-Entity Load Tracking (PELT) mechanism, which also provides > -information about the recent load [1]. This governor currently does > -load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks > -are always run at the highest frequency. Unlike all the other > -governors, the code is located under the kernel/sched/ directory. > - > -Sysfs files: > - > -* rate_limit_us: > - > - This contains a value in microseconds. The governor waits for > - rate_limit_us time before reevaluating the load again, after it has > - evaluated the load once. > - > -For an in-depth comparison with the other governors refer to [2]. > - > - > -3. The Governor Interface in the CPUfreq Core > -============================================= > - > -A new governor must register itself with the CPUfreq core using > -"cpufreq_register_governor". The struct cpufreq_governor, which has to > -be passed to that function, must contain the following values: > - > -governor->name - A unique name for this governor. > -governor->owner - .THIS_MODULE for the governor module (if appropriate). > - > -plus a set of hooks to the functions implementing the governor's logic. > - > -The CPUfreq governor may call the CPU processor driver using one of > -these two functions: > - > -int cpufreq_driver_target(struct cpufreq_policy *policy, > - unsigned int target_freq, > - unsigned int relation); > - > -int __cpufreq_driver_target(struct cpufreq_policy *policy, > - unsigned int target_freq, > - unsigned int relation); > - > -target_freq must be within policy->min and policy->max, of course. > -What's the difference between these two functions? When your governor is > -in a direct code path of a call to governor callbacks, like > -governor->start(), the policy->rwsem is still held in the cpufreq core, > -and there's no need to lock it again (in fact, this would cause a > -deadlock). So use __cpufreq_driver_target only in these cases. In all > -other cases (for example, when there's a "daemonized" function that > -wakes up every second), use cpufreq_driver_target to take policy->rwsem > -before the command is passed to the cpufreq driver. > - > -4. References > -============= > - > -[1] Per-entity load tracking: https://lwn.net/Articles/531853/ > -[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/ > - > Index: linux-pm/Documentation/cpu-freq/user-guide.txt > =================================================================== > --- linux-pm.orig/Documentation/cpu-freq/user-guide.txt > +++ /dev/null > @@ -1,226 +0,0 @@ > - CPU frequency and voltage scaling code in the Linux(TM) kernel > - > - > - L i n u x C P U F r e q > - > - U S E R G U I D E > - > - > - Dominik Brodowski <linux@xxxxxxxx> > - > - > - > - Clock scaling allows you to change the clock speed of the CPUs on the > - fly. This is a nice method to save battery power, because the lower > - the clock speed, the less power the CPU consumes. > - > - > -Contents: > ---------- > -1. Supported Architectures and Processors > -1.1 ARM and ARM64 > -1.2 x86 > -1.3 sparc64 > -1.4 ppc > -1.5 SuperH > -1.6 Blackfin > - > -2. "Policy" / "Governor"? > -2.1 Policy > -2.2 Governor > - > -3. How to change the CPU cpufreq policy and/or speed > -3.1 Preferred interface: sysfs > - > - > - > -1. Supported Architectures and Processors > -========================================= > - > -1.1 ARM and ARM64 > ------------------ > - > -Almost all ARM and ARM64 platforms support CPU frequency scaling. > - > -1.2 x86 > -------- > - > -The following processors for the x86 architecture are supported by cpufreq: > - > -AMD Elan - SC400, SC410 > -AMD mobile K6-2+ > -AMD mobile K6-3+ > -AMD mobile Duron > -AMD mobile Athlon > -AMD Opteron > -AMD Athlon 64 > -Cyrix Media GXm > -Intel mobile PIII and Intel mobile PIII-M on certain chipsets > -Intel Pentium 4, Intel Xeon > -Intel Pentium M (Centrino) > -National Semiconductors Geode GX > -Transmeta Crusoe > -Transmeta Efficeon > -VIA Cyrix 3 / C3 > -various processors on some ACPI 2.0-compatible systems [*] > -And many more > - > -[*] Only if "ACPI Processor Performance States" are available > -to the ACPI<->BIOS interface. > - > - > -1.3 sparc64 > ------------ > - > -The following processors for the sparc64 architecture are supported by > -cpufreq: > - > -UltraSPARC-III > - > - > -1.4 ppc > -------- > - > -Several "PowerBook" and "iBook2" notebooks are supported. > - > - > -1.5 SuperH > ----------- > - > -All SuperH processors supporting rate rounding through the clock > -framework are supported by cpufreq. > - > -1.6 Blackfin > ------------- > - > -The following Blackfin processors are supported by cpufreq: > - > -BF522, BF523, BF524, BF525, BF526, BF527, Rev 0.1 or higher > -BF531, BF532, BF533, Rev 0.3 or higher > -BF534, BF536, BF537, Rev 0.2 or higher > -BF561, Rev 0.3 or higher > -BF542, BF544, BF547, BF548, BF549, Rev 0.1 or higher > - > - > -2. "Policy" / "Governor" ? > -========================== > - > -Some CPU frequency scaling-capable processor switch between various > -frequencies and operating voltages "on the fly" without any kernel or > -user involvement. This guarantees very fast switching to a frequency > -which is high enough to serve the user's needs, but low enough to save > -power. > - > - > -2.1 Policy > ----------- > - > -On these systems, all you can do is select the lower and upper > -frequency limit as well as whether you want more aggressive > -power-saving or more instantly available processing power. > - > - > -2.2 Governor > ------------- > - > -On all other cpufreq implementations, these boundaries still need to > -be set. Then, a "governor" must be selected. Such a "governor" decides > -what speed the processor shall run within the boundaries. One such > -"governor" is the "userspace" governor. This one allows the user - or > -a yet-to-implement userspace program - to decide what specific speed > -the processor shall run at. > - > - > -3. How to change the CPU cpufreq policy and/or speed > -==================================================== > - > -3.1 Preferred Interface: sysfs > ------------------------------- > - > -The preferred interface is located in the sysfs filesystem. If you > -mounted it at /sys, the cpufreq interface is located in a subdirectory > -"cpufreq" within the cpu-device directory > -(e.g. /sys/devices/system/cpu/cpu0/cpufreq/ for the first CPU). > - > -affected_cpus : List of Online CPUs that require software > - coordination of frequency. > - > -cpuinfo_cur_freq : Current frequency of the CPU as obtained from > - the hardware, in KHz. This is the frequency > - the CPU actually runs at. > - > -cpuinfo_min_freq : this file shows the minimum operating > - frequency the processor can run at(in kHz) > - > -cpuinfo_max_freq : this file shows the maximum operating > - frequency the processor can run at(in kHz) > - > -cpuinfo_transition_latency The time it takes on this CPU to > - switch between two frequencies in nano > - seconds. If unknown or known to be > - that high that the driver does not > - work with the ondemand governor, -1 > - (CPUFREQ_ETERNAL) will be returned. > - Using this information can be useful > - to choose an appropriate polling > - frequency for a kernel governor or > - userspace daemon. Make sure to not > - switch the frequency too often > - resulting in performance loss. > - > -related_cpus : List of Online + Offline CPUs that need software > - coordination of frequency. > - > -scaling_available_frequencies : List of available frequencies, in KHz. > - > -scaling_available_governors : this file shows the CPUfreq governors > - available in this kernel. You can see the > - currently activated governor in > - > -scaling_cur_freq : Current frequency of the CPU as determined by > - the governor and cpufreq core, in KHz. This is > - the frequency the kernel thinks the CPU runs > - at. > - > -scaling_driver : this file shows what cpufreq driver is > - used to set the frequency on this CPU > - > -scaling_governor, and by "echoing" the name of another > - governor you can change it. Please note > - that some governors won't load - they only > - work on some specific architectures or > - processors. > - > -scaling_min_freq and > -scaling_max_freq show the current "policy limits" (in > - kHz). By echoing new values into these > - files, you can change these limits. > - NOTE: when setting a policy you need to > - first set scaling_max_freq, then > - scaling_min_freq. > - > -scaling_setspeed This can be read to get the currently programmed > - value by the governor. This can be written to > - change the current frequency for a group of > - CPUs, represented by a policy. This is supported > - currently only by the userspace governor. > - > -bios_limit : If the BIOS tells the OS to limit a CPU to > - lower frequencies, the user can read out the > - maximum available frequency from this file. > - This typically can happen through (often not > - intended) BIOS settings, restrictions > - triggered through a service processor or other > - BIOS/HW based implementations. > - This does not cover thermal ACPI limitations > - which can be detected through the generic > - thermal driver. > - > -If you have selected the "userspace" governor which allows you to > -set the CPU operating frequency to a specific value, you can read out > -the current frequency in > - > -scaling_setspeed. By "echoing" a new frequency into this > - you can change the speed of the CPU, > - but only within the limits of > - scaling_min_freq and scaling_max_freq. > Index: linux-pm/Documentation/cpu-freq/index.txt > =================================================================== > --- linux-pm.orig/Documentation/cpu-freq/index.txt > +++ linux-pm/Documentation/cpu-freq/index.txt > @@ -21,8 +21,6 @@ Documents in this directory: > > amd-powernow.txt - AMD powernow driver specific file. > > -boost.txt - Frequency boosting support. > - > core.txt - General description of the CPUFreq core and > of CPUFreq notifiers. > > @@ -32,17 +30,12 @@ cpufreq-nforce2.txt - nVidia nForce2 pla > > cpufreq-stats.txt - General description of sysfs cpufreq stats. > > -governors.txt - What are cpufreq governors and how to > - implement them? > - > index.txt - File index, Mailing list and Links (this document) > > intel-pstate.txt - Intel pstate cpufreq driver specific file. > > pcc-cpufreq.txt - PCC cpufreq driver specific file. > > -user-guide.txt - User Guide to CPUFreq > - > > Mailing List > ------------ > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html