On 12/6/2012 7:01 AM, David C Niemi wrote:
My point is that performance vs. power is not just a linear continuum of preferences. How idle and full speed are handled are of particular importance in many applications. I think we both agree the existing governors are obsolete and do things the wrong way. But we attach different meanings to "policy" and may have different ideas of what should be. I think of policy as very high level and totally compatible with a variety of very different hardware implementations. The minimum a true high-level policy "governor" would need to do is this: a) determine what the hardware's capabilities are (init) b) provide a configuration interface analogous to what we have now but much higher-level and less frequency-centric c) assess system load on an ongoing basis. d) control the power management driver based on the user preferences and the system load pattern. The lines between governor and driver could be drawn in various places, but the point of having some sort of governor is to not have to reimplement the whole stack for every driver.
the sad part is that this is where reality has caught up with the nice theory. hardware keeps innovating/changing around power behavior... very very fundamentally. When we started CPUFREQ (yes I was there ;-) ) we had the assumption that a clean split between hardware and governor was possible. Even back then, Linus balked at that and made us change it at least somewhat.... the Transmeta CPUs at the time showed enough differences already to break. We made, at the time, the minimal changes possible. But really the whole idea does not work out.
The exposed configuration interface might be as simple as choosing one of several discrete settings: - max single-threaded performance - max multi-threaded performance
these are identical on todays silicon btw; or rather, this is not a P state choice item, but a task scheduler policy item.
- "server" setting -- save power but only in ways that do not affect performance
this is a fiction btw... if there was a way to reduce power and not affect performance, that's your "max performance" setting. anything else will sacrifice SOME performance from max...
- "default" -- a good general-purpose middle of the road setting that performs pretty well and also saves power
... so you end up at this one.
- "on battery" setting -- provide good interactive responsiveness but aggressively save power, potentially making long-running tasks take longer
battery has nothing to do with power preference. Just ask any data center operator.
The above is what I think of as policy. There is nothing hardware-specific about these.
> These say nothing directly about what frequency to run or whether to use P-States. and defining a common policy interface I'm quite fine with (not quite in the way you defined it, but ok...) But that's not going to lead to a common implementation as a "governor" ;-( My idea for a policy "dial" is mostly * Uncompromised performance * Balanced - biased towards performance (say, defined to be lowest power at most a 2 1/2% perf hit) * Balanced (say, at most a 5% perf hit) * Balanced - biased towards lower power (sat, at most a 10% perf hit) * Uncompromised lowest power we can argue about the exact %ages, but the idea is to give at least some reasonably definition that people can understand, but that also can be measured -- To unsubscribe from this list: send the line "unsubscribe cpufreq" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html