(Cc folks with some DVFS interest)
Hi Colin,
On Fri, 22 Apr 2011, Colin Cross wrote:
Now that we are approaching a common clock management implementation,
I was thinking it might be the right place to put a common dvfs
implementation as well.
It is very common for SoC manufacturers to provide a table of the
minimum voltage required on a voltage rail for a clock to run at a
given frequency. There may be multiple clocks in a voltage rail that
each can specify their own minimum voltage, and one clock may affect
multiple voltage rails. I have seen two ways to handle keeping the
clocks and voltages within spec:
The Tegra way is to put everything dvfs related under the clock
framework. Enabling (or preparing, in the new clock world) or raising
the frequency calls dvfs_set_rate before touching the clock, which
looks up the required voltage on a voltage rail, aggregates it with
the other voltage requests, and passes the minimum voltage required to
the regulator api. Disabling or unpreparing, or lowering the
frequency changes the clock first, and then calls dvfs_set_rate. For
a generic implementation, an SoC would provide the clock/dvfs
framework with a list of clocks, the voltages required for each
frequency step on the clock, and the regulator name to change. The
frequency/voltage tables are similar to OPP, except that OPP gets
voltages for a device instead of a clock. In a few odd cases (Tegra
always has a few odd cases), a clock that is internal to a device and
not exposed to the clock framework (pclk output on the display, for
example) has a voltage requirement, which requires some devices to
manually call dvfs_set_rate directly, but with a common clock
framework it would probably be possible for the display driver to
export pclk as a real clock.
Those kinds of exceptions are somehow the rules for an OMAP4 device.
Most scalable devices are using some internal dividers or even internal
PLL to control the scalable clock rate (DSS, HSI, MMC, McBSP... the
OMAP4430 Data Manual [1] is providing the various clock rate limitation
depending of the OPP).
And none of these internal dividers are handled by the clock fmwk today.
For sure, it should be possible to extend the clock data with internal
devices clock nodes (like the UART baud rate divider for example), but
then we will have to handle a bunch of nodes that may not be always
available depending of device state. In order to do that, you have to
tie these clocks node to the device that contains them.
And for the clocks that do not belong to any device, like most PRCM
source clocks or DPLL inside OMAP, we can easily define a PRCM device or
several CM (Clock Manager) devices that will handle all these clock nodes.
The proposed OMAP4 way (I believe, correct me if I am wrong) is to
create a new api outside the clock api that calls into both the clock
api and the regulator api in the correct order for each operation,
using OPP to determine the voltage. This has a few disadvantages
(obviously, I am biased, having written the Tegra code) - clocks and
voltages are tied to a device, which is not always the case for
platforms outside of OMAP, and drivers must know if their hardware
requires voltage scaling. The clock api becomes unsafe to use on any
device that requires dvfs, as it could change the frequency higher
than the supported voltage.
You have to tie clock and voltage to a device. Most of the time a clock
does not have any clear relation with a voltage domain. It can even
cross power / voltage domain without any issue.
The efficiency of the DVFS technique is mainly due to the reduction of
the voltage rail that supply a device. In order to achieve that you have
to reduce the clock rate of one or several clocks nodes that supply the
critical path inside the HW.
The clock node itself does not know anything about the device and that's
why it should not be the proper structure to do DVFS.
OMAP moved away from using the clock nodes to represent IP blocks
because the clock abstraction was not enough to represent the way an IP
is interacting with clocks. That's why omap_hwmod was introduced to
represent an IP block.
Is the clock api the right place to do dvfs, or should the clock api
be kept simple, and more complicated operations like dvfs be kept
outside?
In term of SW layering, so far we have the clock fmwk and the regulator
fmwk. Since DVFS is about both clock and voltage scaling, it makes more
sense to me to handle DVFS on top of both existing fmwks. Let stick to
the "do one thing and do it well" principle instead of hacking an
existing fmwk with what I consider to be an unrelated functionality.
Moreover, the only exiting DVFS SW on Linux today is CPUFreq, so
extending this fmwk to a devfreq kind of fwmk seems a more logical
approach to me.
The important point is that IMO, the device should be the central
component of any DVFS implementation. Both clock and voltage are just
some device resources that have to change synchronously to reduce the
power consumption of the device.
Because the clock is not the central piece of the DVFS sequence, I don't
think it deserves to handle the whole sequence including voltage scaling.
A change to a clock rate might trigger a voltage change, but the
opposite is true as well. A reduction of the voltage could trigger the
clock rate change inside all the devices that belong to the voltage domain.
Because of that, both fmwks are siblings. This is not a parent-child
relationship.
Another important point is that in order to trigger a DVFS sequence you
have to do some voting to take into account shared clock and shared
voltage domains.
Moreover, playing directly with a clock rate is not necessarily
appropriate or sufficient for some devices. For example, the
interconnect should expose a BW knob instead of a clock rate one.
In general, some more abstract information like BW, latency or
performance level (P-state) should be the ones to be exposed at driver
level.
By exposing such knobs, the underlying DVFS fmwk will be able to do
voting based on all the system constraints and then set the proper clock
rate using clock fmwk if the divider is exposed as a clock node or let
the driver convert the final device recommendation using whatever
register that will adjust the critical clock path rate.
Regards,
Benoit
[1] http://focus.ti.com/pdfs/wtbu/OMAP4430_ES2.x_DM_Public_Book_vC.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html