So, Its time to restart this discussion:) After all the discussion last year, Eugeny and I went back to the drawing board to review the requirements and possible solutions. I thought it would be best to respond to this email to remind everyone where we left off. David Brownell's latest email on this topic (subject has something with cpufreq in it) is also a good one to read. Basically, we finally agree that the operating point concept won't work for every platform and it is actually too limited to be the base abstraction. Please hold applause until the end:) We dove into A) in Dominik's email and started looking at what a knob layer would require in more detail. For the moment let's put aside operating points. We believe that a knob type layer makes sense to be the lowest level as Dominik proposed. This layer is responsible for controlling hardware resources that affect power management and capturing the relationships between resources. Power management resources include components such as clock dividers, pll's, voltage regulators, and power domains. These resources are not always independent and often have a dependency relationship between them. Knob isn't quite the right word for this layer - pm resources are knobs, switches, dials:) We suggest calling this layer a Power Parameter Framework. The goal of this parameter framework is to expose the resources in a way that allows other s/w (governors, policy mangers, etc) to control the resources while keeping the system operational. One of the main requirements in our thinking is that we want this layer to represent the h/w and not include policy or decision making. Meaning the software using the parameter framework would be responsible for deciding the appropriate value for the parameters. The framework breaks down into 4 parts: - PM resource representation Similar to the device abstraction available today. Platforms need to define which resources will be controlled by the parameter framework. We need to take into account that resources will be from SoCs and boards. - PM resource control Architecture independent API for enable/disable get/set of parameters. Also provide information such as valid ranges or values for the parameter based on hardware limitations. - The API would work in terms of parameter values such as frequencies and voltage not register or divider values. - Each parameter is referenced by a id/handle to maintain architecture independence. - The set function accepts a list of parameter value pairs as well as a single parameter value pair. - Dependency relationships We believe 3 types of dependencies need to be addressed. - Parent/Child. This relationship would be for parameters of the same type such as clocks that depend on each other. Mostly likely a tree structure similar (or exactly the same as) the clock framework except generic for any type of parameter. - Domain. This relationship is for parameters of different types. For example some platforms provide a gate for the voltage supply to a set of clocks. The framework would capture the relationship of the voltage gate to the clocks so that information can be used when setting parameters. - Functional - Often there are platform specific dependency relationships that need to be captured and addressed in some way. Some examples: A single register may be used to control several independent clocks requiring some coordination when setting a new value for one or more of the clocks; One parameter may need be changed before another due to some platform specific peculiarities. - Resource reference counting Its important to keep track of when a resource is being used. If no one is using a resource, then a higher level s/w component (governor/ policy manager) can decide to turn off the resource. The framework would provide a claim/release set of APIs for other subsystems/ drivers to use. What is and is not included in the parameter framework? - Resources that affect more than one component would go into the framework. For example, a clock that is used by two or more I/O devices would need coordination to change. Therefore it goes into the parameter framework. A resource used by only one device driver and doesn't affect other devices/parameters should be controlled directly in the device driver and not exposed in the parameter framework. - The platform designer (or the guy doing the board port) decides which resources makes sense to expose on their platform. Not all resources are required to be included. In fact, it may make sense to expose multiple resources as a single parameter. - Use case and value based parameter relationships would not be included in the parameter framework. These relationships are not required to keep the system operational and not every platform will have them. This is where operating points start to make sense. An optional layer on top of the parameter framework would provide the ability to group parameters together in a similar manner to operating points. If a platform has a set of optimal parameter values for specific use cases, then it would define parameter groups and assign a group id for the set of values. Notifications The framework needs to provide the ability to subscribe to notifications for individual parameter changes. Device drivers would be able to subscribe for pre and post change events and act accordingly. Verification The framework API provides the range or the valid set of values for a parameter so a potential value can be verified. Also, the parameter dependencies relationships are followed when a parameter or set of parameters are set. If we can agree to and get a basic framework as described above in place, we have a good building block for solving some of the other issues such as constraints and policy decisions. Also, we have a framework in the kernel for clocks today. This framework would incorporate the clock framework ideas making them generic for any time of power resources and easier to define/use. I believe this power parameter framework should solve many (if not all) of the issues raised by using operating points as the base abstraction and provide a common layer across architectures. Eugeny and I have the beginnings of an API proposal for this framework, but we wanted to get some high level feedback on the concepts so we can adjust the API if necessary. So, comments? Matt On Oct 6, 2006, at 7:36 PM, Dominik Brodowski wrote: > Hi! > > As you know, I never looked too friendly upon PowerOP and the > "operating > points" concept. My latest messages may have illustrated this point > even > further -- but the reason for that is that I more and more get the > feeling > that PowerOP and "operating points" and the so-called new "PM core" is > trying to do too many things at once, and therefore mixes up differnt > levels. Here is a rough sketch of what I'd like to discuss[1] as an > alternative: > > > A) The lowest level: lots of knobs. > > Somewhere in a "computer system"[2] there are very many "knobs" > which may > be turned to influence various voltages, clock levels, or operating > modes > ("turbo", "performance" or "powersave", for example). > > Also, there might be many dependencies on how these "knobs" may be > changed. > > Let's assume the system is in a well-defined, working state right now. > > > B) I want to change one such knob!!! > > Now, let's say that we want to change one value controlled by such > a knob. > What must we do? We need to check that changing it > a) does not violate any dependency ["verification"] > b) all dependencies are handled in correct order ["notification"] > > > C) Notification > > Let's look at the "notification" stage first -- that's what current > cpufreq > notifiers do in a very basic way. However, this is also what the > new clock > and voltage frameworks are trying to do, right? So that's the > lesser problem > now. > > > D) Verification > > So, how to do this verification? Basically, there are two approaches: > > 1) ask every other subsystem whether the new value is OK with it. > This is what cpufreq currently suggests to do. It is evident > that this gets overly complicated with lots of dependencies > and dependencies within the dependencies -- both in terms > of concept and in terms of time the verification code takes > to execute. > Advantages: > - easy to expand, also in runtime (e.g. USB system is > modprobed and telling you of a new minimum voltage > requirement on certain circumstances) > - does not limit choices for each knob > Disadvantages: > - might get very complex > > 2) look up all valid states in a table > This is basically what PowerOP and the "operating points" > concept suggests: if you want to change one value, you check > what operating points a) contain the new value and b) is > most suitable to you. > Advantages: > - fast > - pre-defined set of operating points which the system > designer is comfortable with > Disadvantages: > - needs to be limited to "core" of the system as else > the tables may get overly large > - limits the choices > > > E) So, why not combine the best of both worlds? > > > If you want to change a knob, the "PM core" looks both at every other > subsystem adding dependencies, and at a "operating points" table > _ifff_ it > exists. > > > > F) So, how would this work for OMAP1? > > Let's limit it, to keep it somewhat simple, to the values contained > in your > "struct pm_core_point" for OMAP: > > int cpu_vltg; /* voltage in mV */ > int dpll; /* in KHz */ > int cpu; /* CPU frequency in KHz */ > int tc; /* in KHz */ > int per; /* in KHz */ > int dsp; /* in KHz */ > int dspmmu; /* in KHz */ > int lcd; /* in KHz */ > > and let's also add a > > int i_am_special; > > Let's assume that there is an OMAP1 PM module which implements a - > >set and > ->get function for all of them. A yet-to-be-defined interface then > tells > this PM module > > "I want to increase the CPU frequency from C1 MHz to C2 MHz!" > > ->set(CPU_VLTG, C2); > > The ->set function would then ask whether it is allowed to switch to > frequency B. How would it ask for that? It would both call the > "operating > points" layer to check whether such a table is registered. Now, > let's assume > there are no external subsystems affected by this change, and the > system > engineer has defined such a table: > > Nr. CPU_VLTG CPU TC ... i_am_special > 1 A1 C1 D1 1 > 2 A2 C1 D1 2 > 3 A1 C2 D2 3 > 4 A2 C2 D3 4 > > The core would determine that the latter two states are now allwed, > and > using some sensible algorithm (e.g. "where do I not have to switch > too many > knobs", or minimize the costs of switching) decide between those two. > Basically, it would recignize now that it is OK to proceed from > state Nr. 1 > to Nr. 3, but that this means that "tc" also needs to be changed. > After > notifing relevant subsystems using the clock and voltage > frameworks, it > would then proceed to set the hardware accordingly. > > Now, some might argue "I want to tell the interface to enter mp3- > mode, and > not enter some CPU_VLTG and hope that it selects the right table > entry then > in the verifcation stage!" Well, you can do that. Using the > i_am_special > pseudo-knob. You just tell the yet-to-be-defined interface "I want > to switch > knob I_AM_SPECIAL to 4". The process is the same. > > > G) So, what does this get us? > > It may look as "Operating Points" turned on its head now. And yes, > it is. > But you can do the following now: > - let cpufreq call ->set(CPU_FREQ, <value>), if you want dynamic > frequency > scaling, > - use pre-defined operating points if it's suitable to do so, > - handles all dependencies either way. > > Oh, and as the operating point concept is only introduced as an > element > between the low-level setting and the "high-level policy decision", > it does > not need to be squeezed into current cpufreq drivers or even the > current > cpufreq core in any way. cpufreq may call it, but that should be > relatively > easy to implement. > > > I think that this might be much easier to implement than your > PowerOP / > operating points / PM core / PowerOP - cpufreq interaction patches. > As a > matter of fact, some parts of your operating points table > infrastructure > may be usable for the concept outlined above. So, what do you > think? What > does everyone else involved think about this alternative approach? > > > Thanks, > Dominik > > > [1] As many here are aware, I will have very limited time to actually > implement it. > [2] embedded device, notebook, cluster, desktop with lots of USB > devices > connected, and so on > _______________________________________________ > linux-pm mailing list > linux-pm@xxxxxxxxxxxxxx > https://lists.osdl.org/mailman/listinfo/linux-pm > _______________________________________________ linux-pm mailing list linux-pm@xxxxxxxxxxxxxx https://lists.osdl.org/mailman/listinfo/linux-pm