Re: Alternative Concept

David Brownell <david-b@xxxxxxxxxxx> · Thu, 15 Mar 2007 19:21:54 -0700

On Thursday 15 March 2007 6:04 am, Igor Stoppa wrote:
> On Thu, 2007-03-15 at 12:53 +0300, ext Eugeny S. Mints wrote:
> > > On Wednesday 14 March 2007 3:43 am, Eugeny S. Mints wrote:
> > >>> Would this involve replacing the clock framework, or are they going to coexist?
> > >> parameter framework would eventually replace clock framework.
> > > 
> > > That seems to be the wrong answer.  Especially since nothing has
> > > been shown to be wrong with the clock interface; much less to be
> > > unfixably wrong (hence justifying replacement).
> > 
> > a cherry-picking on clk fw API:
> > 
> > - clk_set_rate() sticks to an individual clock - no way to set rates for number 
> >   of clocks at once instead of having series of clk_set_rate() calls.

This isn't "wrong", it's a "lack of feature".  The normal process for
addressing such things is to improve working/deployed software interfaces,
not throw them out and try to create and deploy something new.  :)

Plus, that's not actually true.  When you change the rate for a parent
clock, all its derived clocks necessarily change ... however many they
may be.  And there's nothing to prevent updating non-derived clocks.

Of course, such a change *does* touch on something I'd agree is a notable
omission:  requests to change such a clock rate (call it a "base" rate
for purposes of discussion here) are currently safe only if the related
clocks have no drivers using them, since there's no way for those driver
to either veto the change ("I can't support 115200 baud at that base rate"),
or reclock to handle the changed environment ("update clock divisor to
maintain current serial port baud rate").

Which is why currently most clock rates aren't currently changeable; and
those which can be changed are mostly leaves on the clock tree.

On some platforms that significantly impacts what cpufreq (or whatever
DVFS scheme is used) could achieve.

> > The former  
> > for example is required for clocks which have required predefined ratio 
> > dependency though: two clocks always have to be n:2n and any other ratio leads 
> > to hardware misbehavior.

The programming interface doesn't preclude enforcement of that rule
though.  In fact, I'd call any *implementation* buggy if the hardware
has such a rule about base rates, but the software doesn't enforce it.
(That is:  it's not an interface issue.)

> > So, you can't go from 100:200 to 300:600 via series of  
> > clk_set_rate() for such clocks, i.e. 100:200 -> 300:200 -> 300:600 or 100:200 -> 
> > 100:600 -> 300:600 (see SH7722 hw for reference)

AT91 (e.g. at91sam9263) processors have MCKR with similar restrictions,
but the only time this could ever matter seems to be for cpufreq.

Does this matter to you for any other reason than trying to use the
clock framework to implement cpufreq?  If not, there's no need for
the clock framework to define how to do this ... since cpufreq is
so highly platform-specific.

If you want to say that this makes it hard to implement cpufreq in terms
of the clock API, I'd agree ... but then, the *complete* lack of driver
clock change notifications is a far bigger issue.  Drivers like MMC/SD,
UART, SPI, and I2C commonly need to update internal clock settings when
rates change, and may need to plug their lowlevel I/O queues while those
base clock rates change. 

> What's wrong with expanding the clk_fw?
> All is needed is:
> clk_set_rate_buffered(clk1, 300);
> clk_set_rate_buffered(clk2, 600);
> clk_rate_flush(); /* can include validation of the set */
> 
> Which is, incidentally, what OMAP2 does in hw for all the relevant clk
> dividers and it also provides validation for the new set of values.
> 
> Furthermore if the original assumption that complex transitions are
> allowed only atomically (p1A, p1B) => (p2A, p2B), hw support is
> mandatory, otherwise the transition is impossible, no matter what fancy
> sw fw is performing it.

I think Scott's "power transaction" notion is a bit better, since
it should among other things provide a way to cope with cases like
two concurrent tasks trying issue conflicting changes.  An SMP
system ought to be able to reclock two or more CPU cores at once;
and maybe even do it with one "power transaction".

> > - only a rate can be set up via clk_set_rate() while for a PLL I want to set up 
> > a desired idle state as well (btw there can be more than one idle state)
>
> Is there any PLL which provide automatic switching between running/idle state?
> Do you have any HW architecture on your mind?

And why should such a hardware-specific mechanism show up in an
all-platforms API?  There's quite a lot of hardware-specific stuff
going on, and that seems like just one more instance of it.

> > - current API does not provide support for clock domains
>
> So?

Exactly.  :)

And again, that's not strictly true either; the clock tree itself
defines one kind of domain, also known as "subtrees".

A point I'll make below with respect to power domains:  unless
those domains need to be exposed for some reason to drivers,
there is no need for these to show up the programming interface
except as implementation artifacts.  As in:  activate this clock,
and platform-specific stuff happens.  It always does, of course,
but in some cases it will be more extensive than in others.

(Activating a PLL normally takes more than setting a bit to
open a clock gate; activating some clocks might invove kicking
some clock or power domain logic.)

> > a cherry-picking on clk fw implementation:

Which platform's implementation?  I've seen several...

> > - clock tree structure and traversing clock tree code are duplicated in every 
> > architecture while can be done in arch independent way and just once (hint: what 
> > indeed is an arch specific thing it's a clock tree configuration)
>
> that looks more like an optimisation to the clk fw, not a call for a brand new one

And I'm not even sure calling it an "optimization" is correct.

Given how arch-specific the tree structure is, it's not clear
that traversing it shouldn't be arch-specific too.  And in any
case ... the traversal code I've seen is so trivial I'd have a
hard time worring about sharing it.  Implementation flexibility
seems to be widely useful in handling arch-specific differences.

> > >> Separate clock and voltage frameworks

Not that we _have_ a voltage framework yet, e.g. "turn on that
that power rail, configured at 3.3V".

Plus rememember that a voltage framework is not synonymous with
a DVFS kit ... and that a DVFS kit may need to interact with a
lot more than just voltage and frequency controls.  Example,
reclocking CPU and memory frequency may need to run out of SRAM
and without accessing page tables stored in DRAM.

> > >> lead to code and functionality duplication

But since a clock is not a voltage, the primitives must differ.
There's no analogue of "allocate 80 mA" for a clock, as there
would be with a voltage framework's power budget primitives.

> > >> and do not address   
> > >> such things as relationship between clocks and voltages, clock/voltage/power 
> > >> domains, etc needed for aggressive power management.
>
> relationships are very arch specific so having or not the fw there
> wouldn't probably make any real difference.
> About power domains ... well, what's the deal?
> Aren't they controlled like voltages, in the end?

The last time I looked at a power domain, it was more like a
prerequisite to using various device clocks.  It wasn't something
that needed to be exposed in an API ... it would suffice to
have it be activated/deactivated as a side effect of accessing
those clocks.

That is, the existing hooks suffice.  The platform-specific code
to enable a given clock subtree would just have a few lines of
code teaching it how to enable the relevant power domain.  They
already rely on platform-specific logic to sort out which register
bank to touch, which bits there, and various other things.

Although that recent work for memory partitioning is relevant
here too:  being able to power down banks of DRAM (ensuring they
hold no data or code that's currently needed) can certainly cut
down one category of power drain.

In the same vein, some utilities to manage on-chip SRAM would
likely be helpful.  It's common that certain low power modes
disable all DRAM, leaving only (say) 16 KB of SRAM available
for running code ... or that DRAM be unavailable while CPU
and/or memory are being reclocked (maybe by a DVFS scheme,
using cpufreq or whatever).

> > > Most clocks don't have those issues.  Why penalize all clocks for
> > > issues which only relate to a few?  Better to only do that for the
> > > few cocks which have such additional constraints.
> > 
> > parameter fw would give exactly this behavior: relationship between clocks and 
> > voltages, clock/voltage/power domains implementation would be just additional 
> > arcs between nodes for clock and voltages. Nowadays clk fw has only one type of 
> > arcs - parent-child type. parameter fw would bring additional types. But clock 
> > nodes would be linked just with required arcs (of required type; both are arch 
> > specific things) so for an arch without any additional dependencies between 
> > clocks (and voltages) parameter fw would end up with exactly equivalent clock 
> > tree as clk fw for the arch has today.
> 
> I think what has to be clarified, is wether there is a real need for new
> features or not. 

Not just "new features" but "new cross-platform features".  This PM stuff
is really down'n'dirty code ... who will be working with such code?  The less
we demand of those people, the more likely it is that such features would
actually be delivered in working form.  A lot of platform vendors work
with older kernel releases (2.6.10 etc), which means that new APIs will
be obstacles to development and deployment.

Going back to clock and power domain examples above:  it's *easy* to see how
those could be delivered using the current clock framework.  There are a LOT of
folk who are familiar with those interfaces -- today.  So unless there really
is no way to address the issue with the current APIs, it's best to avoid
creating new interfaces for those issues.

> > > Plus, remember that the clock framework is an interface ... so by
> > > definition, it has no code associated with it.  Hence no duplication
> > > of code is possible... at least at this hand-wavey "concept" level.
> > > Possibly a given implementation has scope for code sharing; but I
> > > doubt it.  Code behind a given implementation of the clock interface
> > > is invariably quite slim.
> > 
> > and invariably looks like a hack and still duplicate clock tree building and 
> > traversing code. 

Most cases I've seen build the clock tree with a bunch of static struct
declarations, so the building is all at compile time ... and could hardly
duplicate something on another platform, since it's all platform-specific.
And as noted above, traversal code is trivial ... 

> > Dependencies which exist in modern hw between clocks, clocks  
> > and voltages require a more straight and standard technique to be set up for 
> > implementing clk/vltg/name_it framework.

I think Linus has pointed out that cut'n'paste is a very straightforward
and standard technique.  :)

The clock-to-clock dependencies are handled by the clock framework
already.  As are at least some cases of power domains.

DVFS -- cpufreq or otherwise -- is the primary example of a case where
coupling is needed between clocks and voltages.  These PM discussions
lately seem to dance around DVFS without saying so explicitly ... seemingly,
to avoid the "so why not just fix cpufreq" discussions that would (should!)
naturally follow.

My own two cents on the cpufreq thing:  (a) it should support the driver
model better; (b) a generic "run from SRAM" solution would be an aid to
that and other PM code; (c) without clock framework support for drivers
being able to reject changes or else re-clock, changing base clocks is
often impractical; (d) I don't recall seeing anyone really answer the
"so why not ust fix cpufreq" question.

> > Moving most of the code to be arch  
> > independent and setting a clear rules on how a clock/vltg tree configuration 
> > would look like while just looking at a hw manual would help.
>
> I think clocks and voltages providers have very different behaviours: for example
> a voltage src can sometimes be put in quiescent state, where the voltage value is
> preserved, but the max current is significantly reduced, therefore minimizing the
> leakage. I wouldn't welcome such functionality to be merged with clock handling.

Another good example of these APIs being domain-specific ... though
in a way it's a special case of the one I gave above.  In a similar
vein, lower current (but not quiescent) modes may be able to use
alternate schemes for voltage regulation.  The regulation may need
software inputs to switch schemes.

> > > If a clock being enabled implies a power or voltage domain being active,
> > > there's no reason that constraint shouldn't be enforced by whatever
> > > implementation a given platform uses.
> > 
> > true. but it's the same functionality required on different arches. and it can 
> > be done in arch independent way without penalties for arches which do not 
> > require the functionality. What's the rationale to move it down to arch specific 
> > code then?
> 
> I'd rather ask: what's the benefit of merging apples and oranges?

Someone may be tiring of a steady diet, and wants to become a chef?  :)

I'll repeat that power and clock domain examples above:  the current
clock framework is **ALREADY** arch-independant in exactly the way
Eugeny sketched, for several platforms.

> > > And having a generic -- basically untyped -- notion of "parameter"
> > > seems significantly less good than having a typed notion, with
> > > type-specific operations.  
> > 
> > there is no type-specific operations. clk_set_rate() and vltg_set() basically do 
> > the same thing - set a value for a pm resource provided by clock or voltage 
> > device (regulator). The difference is that output of clock regulator is measured 
> > in Hz and named 'frequesncy' and output of voltage regulator os measured in mV 
> > and named 'voltage' which actually does not matter from API POV. So, i see more 
> > sense in having param_set(), parame_set_state() (set_state primitive for idle 
> > states) rather than in
>
> See my comment above about different, peculiar, behaviour of voltages.
> And that's just an example.

One of several, I added more ...

> > clk_set_rate(), clk_set_state()
> > vltg_set(), vltg_set_state()

vtg_allocate_budget() would have no clock domain analogue,
nor would vtg_free_budget() ...

> > and, our analysis shows that you would end up with a separate type for domains, 
> > so it would be at least
> > domain_set_state() as well.

See my comments above re clock and voltage domains, applicable to
at least half a dozen platforms.

Until you share your analysis, so we can see what we agree and disagree
with, it's not helpful for you to claim that it supports your argument.

> > > Typed notions are easier to understand,
> > > read, and maintain.
> > 
> > common, after tricks with spinlocks in [merged!] RT patches? ;)

I don't follow this comment.  I'm still waiting to see the NO_HZ stuff
become as stable and fast as the previous NO_IDLE_HZ stuff ... but
none of that seems to relate to type-unsafe programming interfaces.

- Dave

> > 
> > Eugeny
> > 
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxx
https://lists.osdl.org/mailman/listinfo/linux-pm