On 8/27/06, Eugeny S. Mints <eugeny.mints at gmail.com> wrote: > 2006/8/26, David Singleton <daviado at gmail.com>: > > On 8/19/06, Dave Jones <davej at redhat.com> wrote: > > > On Sat, Aug 19, 2006 at 08:20:45PM -0700, David Singleton wrote: > > > > > > > If I had all the existing cpufreq tables transformed > > > > into operating points I could make a patch that would remove > > > > the bulk of cpufreq code from the kernel and you'd have > > > > pretty much the same functionality without the maintenance > > > > issues the added layers and complexity bring. > > > > > > If this is going to fly at all, I think thats where we need to be headed. > > > Having two parts of the kernel doing the same thing just seems > > > very wrong to me. > > > > > > The other alternative as suggested earlier this week would be archictures > > > getting to 'opt out' of powerop for their cpufreq drivers where it doesn't > > > necessarily bring anything but the layer of indirection. > > > > > > I'm about to disappear for two weeks for a much needed vacation, but > > > I'll be interested to see other folks comments/opinions on this > > > when I get back. > > > [snip] > > 1) I believe I now have the right kernel interface for a common > > power management infrastructure. > > > OpPoint continues to focus on user space interface development for > power management in contrast to that there seem to be an agreemment in > the comunity to defer this integration due to in fact quite a lot of > open/undiscussed and complex questions about this integration and > instead to focus on getting a consensus on operating point structure > definition and methods to work with the structure instances. Actually OpPoint is focusing on all the interfaces, user-kernel, kernel-architecture independent - power management interfaces, and power management framework - architecture/platform specific interfaces. > > OpPoint continues to focus on integration with CPUFreq in a manner > which was outlined as an anacceptable during recent discussions on the > list - removing the concept of a inkernel governor and most of the > CPUFreq feature code. The point of OpPoint is to show that a unified power management infrastructure is possible and that bolting on another power management infrastructure to the kernel is not the right approach. OpPoint is not trying to replace cpufreq. It's trying to unify all the power management infrastructures into a a single infrastructure. OpPoint uses the cpufreq notifier infrastructure to do both operating opoint transition and driver scaling notification, and it performs the same basic functions as cpufreq, without the policy and governor code. It is also performing all the same Dynamic Power Management functionality on the pxa27x mainstone. The point is one infrastructure can support them. And with the new oppointd power daemon it is performing all the same functions as cpuspeed did on my laptop, just with a lot less code in the kernel. > > OpPoint continues to develop userspace interfaces and integration > based on operating point definition for which Matt and I posted > issues/questions several time and the posts have been left without a > reply. Sorry, I'm having a hard time keeping up with all the email threads. > > Below I'm trying to summurize all issues I see with OpPoint approach > sometimes using terms defined in PowerOP approach (for example layer > names). > > 'struct powerop' definition > ------------------------------------ > - frequency, voltage fields are arch specific: not to mention any > complex embedded case but current definition and OpPoint > implementation does not work even for x86 SMP case. Actually frequency, voltage and latency fields are architecture independent and a necessary peice of information that any power manager must have. You are right, I have not yet put in the additional layer to support SMP systems. That is one of the pieces I'm still working on. > > - latency is not an attribute of a certain operating point but a function of > two arguments - current operating point and a point we are goint to > switch to. Therefor latency just does not belong to 'struct powerop' I disagree. > > - all hooks are redundant: the hooks are the same for all operating points > untill we come to the integration with suspend/resume. But we believe the > intagration needs more investigation at the first place and at the second we > feel like the integration may be handled on PM Core layer instead > of having per operating point hooks The hooks are not redundant nor the same for all operating points. Each operating point defines it's prepare, transition, and finish functions for the hooks. And different types of operating points may have completely different functions in those hooks, on the same platform. > > - prepare_transition and finish_ransition may be moved even below PM Core to > clock/voltage framework; needs more carefull investigation though I disagree. Both the pm suspend and cpufreq code has them in exactly the same place. > > - md_data has an issue from OO design paradigm perspective. OpPoint > requires an entity above PowerOP to know internals of arch md_data (see > centrino-dynamic-powerop.c implementation) and thus requires an arch > dependent header file to be included in the code which can be > impemented in arch independent manner. That would be fine if there was > no solution to achieve required functionality without such a hack but > PowerOP provides such approach by dereferencing power parameters by > name. File which implements operating points registration in PowerOP > approach does not include any header file from include/asm-* subtrees. No, the md_data is the opaque pointer into architecture dependent data. The power management infrastructure doens't need to know what data is linked into md_data, just as drivers have driver specific structs that are opaque to the upper layers of software. > > All further pieces porposed by OpPOint base on the above incorrect > design of the main structure and therefore have issues. wow. > > integration with suspend/resume > ------------------------------- > - mixing system state and operating point concept (different points > may correspond to a sleep/standby system state) The pxa27x code shows that indeed there are more than one suspsend state, which is why the operating point model works so well on both my centrino laptop and my pxa27x mainstone running the same oppointd power daemon. > > - legacy PM states are redefined via new OpPOint interface but do not > use it (explicit 'if' statements in legacy pm code instead of OpPOint > hooks uilization) The enter_state code could be merged into the pm_change code, or vice versa, I haven't had time to make it really unified and pretty. > > - names for operating points presented in the original letter below > implicitly assumed the points are ranged by some order (now it is from > the highest [power comsumption] to the lowest. However having many > more power parameters than just one freq and one voltage does not > allow to range the points in such a way and a string name without > knowledge of a particular power parameter values is not sufficient That's not quite correct. The ordering of names, lowest to highest, allows the power manager daemon to cover most of the use cases right out of the box. It's performing the same functions on both my centrino laptop and the pxa27x mainstone right now without any changes to either the power manager or power managenment config file. One of the next boards I'm working on has different operating points at the same frequency, but different voltages. All that is realy required to support this a plugin to the power manager that understand the different operating points so it can best choose when to transition to each point. Custom plugins to a power manager that lets the power manager deal with the unique set of operating points on a particular platform is one of the really attractive parts of OpPoint. It won't have to be woven into the kernel. > (even in x86 SMP case: not to mention it's hard to me to express SMP > case in current OpPOint terms but what are names and how to > distinguish/range 2 CPUs system states corresponded to 'highest point > for CPU0 + medium for CPU1' against 'low for CPU0 + high for CPU1' ?) I'm still working on the SMP case. It's not that I'm ignoring it. Give me a few more days. > > - no example of (at least optional) capability to export information about > particular power paramenter is presented while it was obviously > highlighted by embedded community that it is a must Which parameters besides frequency, voltage and latency are required to be exported to the power manager? > > - direct utilization of PM internal structure 'pm_state' instead of an attempt > of an API > > cpufreq core and a cpufreq driver/OpPOint integration > ------------------------------- > - integration with legacy cpufreq interface is completely missing in both arch > (x86 and pxa) examples. If OpOint was a universal approach it would > allow to build different interfaces on top of it. In this case you can > porpose more optimized/improved interface if you feel existed > interface has issues leaving existed interface as a [configurable] > option and remove it when agreed. I'm sorry, I don't understand that statement. I'm still opposed to dynamic-on-the-fly construction of operating points. It's really dangerous. The hardware vendors want it so that new hardware doesn't have to wait for software before they can sell it. The cpufreq structure of defining and validating operating points before being integrated into the kernel is the correct way to do it, in my opinion. > > - while clear desgin and interfaces are outlined for so called PM Core > layer by PowerOP approach this layer is not addressed by OpPoints in > any way correct. They are a different design. > > - a cpufreq driver still should contains code to access arch hardware > while the functionality of cpufreq driver falls into PM Core layer and > there is no longer reason to have the functionality related to cpufreq > concept Is this a statement about PowerOP? OpPoint doesn't use the PowerOp PM Core layer definition. OpPoint only has 3 layers: 1) user space power manager and user-kernel interace. 2) architecture independent layer between the kernel and the power managment infrastructure. 3) The architecture dependent layer that does the work and has to touch all the hardware. The architecture dependent layer is the piece where where the hardware specific operating points and functions to transition to the operating points are defined. This is also why it's so simple to add new architecture and platform support. All that is needed is the architecture dependent portion to support a new platform. > > - no any integration with clock/voltage framework. Integral solution > which includes Clock/voltage framework just saves more power [period]. No so. The mainstone uses the existing <linux/clk.h> clock framework, and it must since it supports so many different clocks to transition to a new operating point. I'm still open to integrate with any new voltage framework, I just haven't seen it yet. I also don't believe it will be a problem integrating with voltage framework. The voltage framework will be needed by the architecture dependent pieces of power management and a common voltage framework will just make it easier. > > x86 cpufreq/OpPoint integration > ------------------------------- > - struct powerop hooks are expected to be arch specific but intialized by some > cpufreq core routines > > - cpufreq driver still shares cpufreq core cpufreq_frequncy_table structure Correct, the cpu_frq table structure is the piece that gets the gets the frequency and voltage right. I'm not changing operating points definition for the existing processor line. I'm just simplifying the transition to and from the existing system states. > > - integration with legacy cpufreq interface is completely missing Not quite. Once the operating points are constructed, from the same validated data, the oppointd daemon can perform the same legacy cpufreq functionality. Governor and policy code moves out of the kernel into the power manager. It integrates through the same cpufreq table data, the same cpufreq notifier lists for transitioning and scaling drivers, and moves policy management code out of the kernel into the power daemon. > > - OpPoint design does not handle SMP case. > > PowerOP addresses all the issues mentioned above and works for SMP > case. Integration with legacy kernel PM code (including constraints > and standalone driver suspend/resume) and a certain userspace > interface (basically which can be any having current PowerOP interface > underneath) are the next steps for PowerOP approach once the correct > brick of PowerOP layer is in place. It does? David > > Eugeny >