[linux-pm] cpufreq terminally broken [was Re: community PM requirements/issues and PowerOP]

david-b at pacbell.net (David Brownell) · Tue, 27 Feb 2007 12:55:42 -0800

Catching up on some ancient mail from one mbox ..

On Wednesday 13 September 2006 4:50 pm, David Singleton wrote:
> OpPoint constructs operating points for all supported frequency, voltage
> and suspend states for PC and SoC solutions running Linux.

That's one basic issue I have with such approaches to desribing operating
points:  "all" such states gets to be an enormous set.  What I've seen of
both PowerOP and OpPoint says that they both try to limit that set by just
enumerating a handful of specific operating points ... but the more generic
solution (generally matching chip specs) would be having a way to constrain
the parameters within their natural limits.  (Rather than picking out a set
of half a dozen system modes in advance, by hand.)

 - With CPU clock at AAA MHz, chip voltage K must be between A1 and A2 volts;
   but those other clocks only need to follow their usual rules.   This
   defines a set of many operating points.

 - The MMC driver needs to have power supply P output 3.3 V at 80 mA and
   have clock D active.  (Presumably, a different set of operating points.)

 - If clock D is active, the SOC chip can't enter power state X; but again,
   other clocks can use their normal rules.  Again, many operating points.

 - While UART U3 is set to 115200 baud, certain values of clock M aren't
   allowed; but there are no other constraints, so that many operating
   points are compatible with that configuration.

 - Those chips must be in power state X for that module to enter power
   state Y; other modules can be in any power state.

 - That module must be in power state Y for the system to enter power
   state Z.

 - Because of chip errata, <these> parameter combinations (or transitions)
   are invalid; don't trigger them.

Cataloguing every possible power-related parameter seems like a losing
game, even on relatively tiny systems which pay attention to power usage
from within each driver ... and doomed to failure in larger scenarios,
like that 256-core case.

It seems that I'm actually criticizing the notion of "operating point" as
a model to expose as a power management target ...

It's simple to say that the system is at a particular operating point,
and that it's an operating point that works well for MP4 playback.  That's
like saying "it's warm today"; there are many kinds of warm day.  It's
purely descriptive, and omits lots of relevant details.  (Rainy too?)

But I really can't think it would be common for that to be the _only_ such
operating point ... simple counter-examples include the MMC and UART cases
above, considering that playback could often work with or without MMC active,
with or without UART at 115200 baud.  Ergo, multiple operating points support
MP4 playback, ergo "operating point" isn't the key notion that would need to
be exposed.  QED.

Now, where does that leave us?  I think it leaves us looking at how those
constraints get expressed (by e.g. device drivers for clocks and voltages,
ditto cpufreq drivers) and to what they get expressed (clock framework,
voltage framework, maybe a CPU horsepower manager the scheduler talks to).

So for that MP4 example, one could alert the video driver to do its clock
setup, a horsepower manager to say "intensive software decode load coming
up for this RT task", and one of the relevant operating points would be
entered.  And if the video data were coming from the MMC card, a slightly
different one would be entered as part of starting that data stream; etc.

Or in the 256-cpu example, just alert the horsepower manager that a huge
simulation job is upcoming ... that is, if the scheduler doesn't just do
that automatically when it notices it'd be a nice time to bring up some
of those currently-downed cores.  (Automatic up/down of cores may not
behave well in all cases.  By analogy:  madvise gives VM advice that can't
be guessed by the kernel; schedulers may need similar advice.)

Comments?

- Dave