[linux-pm] Re: PowerOP 0/3: System power operating point management API

tpoynor at mvista.com (Todd Poynor) · Tue Aug 16 18:40:05 2005

Dominik Brodowski wrote:

> First, the table interface you suggest is ugly. If there's indeed the need for
> such an abstraction, I'd favour something like

I'm planning to adopt the previous suggestions of an opaque data 
structure and stop trying to have any generic structure to it.  I'll try 
to leave dependency checking etc. to the upper layers as much as 
possible, since platforms vary greatly in this and so do the needs of 
different PM s/w stacks.

> Secondly, you do not adress the cross-relationships between operation points
> correctly. If you change the CPU frequency, you may have to switch other
> (memory, video) settings; you might even have to validate the frequency
> settings for these or even additional reasons (thermal and battery reasons -
> ACPI _PPC).

This lowest layer basically assumes that upper-layer software has 
created an appropriate operating point (for example, in DPM we pretty 
much require a system designer to create operating points that match the 
h/w specs and don't go to great lengths to encode rules about this), 
and/or will call driver notifiers etc. as needed to adapt to the 
changes.  Although there may be some sanity checking appropriate at the 
PowerOP level, cpufreq, DPM, etc. can for the most part continue to 
handle the larger issues of how valid operating points are constructed, 
driver callbacks, etc.  If you do want to handle various dependencies at 
the PowerOP layer then there's nothing that prevents that, but PM 
frameworks tend to embody assumptions about how frequently operating 
points will change and in what contexts (interrupt, idle...), and this 
can influence the code for such things.

> Thirdly, who is to decide on the power management settings? The first and
> intuitive answer is the kernel. Therefore, kernel-space cpufreq governors
> exist. Only under rare circumstances, you want full userspace control --
> that's what the userspace cpufreq governor is for.

Also something left to the existing upper layers; PowerOP isn't intended 
to handle any of that.  In the embedded space we usually let the system 
designer choose operating points supported by their h/w vendor and that 
match their particular system states (hardware enabled at any point in 
time, type and power/performance needs of software currently running). 
We do recommend that a userspace power policy manager be the component 
in charge of PM settings, based on messages from drivers and other apps 
on the state of the system.  And so that userspace component activates 
the operating point (or set of operating points in the case of DPM) 
appropriate for current state.

> Foruthly, the code duplication which your implementation leads to is obvious
> for the speedstep-centrino case. 

We could move the tables of valid cpu speeds and corresponding voltages 
down to the PowerOP level, and there would probably be little 
duplication at that point (in fact, with the current patch there's not a 
lot of duplication since the actual MSR access was moved to PowerOP and 
PowerOP contains little else, but both levels know how to understand the 
MSR format, and a more aggressive port to PowerOP could do away with that).

Your suggestions of changes to cpufreq governors and policies to handle 
governance of non-cpu-speed parameters sound interesting, and I'd be 
happy to help figure out what to do about those vs. the lower machine 
access layer I've discussed up until now.  I'll think more about this 
real soon now.  Thanks,

-- 
Todd