Re: [PATCH v4 0/3] DEVFREQ, DVFS framework for non-CPU devices

"Turquette, Mike" <mturquette@xxxxxx> · Mon, 1 Aug 2011 14:47:14 -0700

On Sat, Jul 30, 2011 at 2:23 PM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
> On Saturday, July 30, 2011, Turquette, Mike wrote:
>> On Fri, Jul 29, 2011 at 2:10 AM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> > On Friday, July 29, 2011, Turquette, Mike wrote:
>> >> On Thu, Jul 28, 2011 at 3:10 PM, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> >> > On Friday, July 15, 2011, MyungJoo Ham wrote:
>> >> >> For a usage example, please look at
>> >> >> http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/devfreq
>> >> >>
>> >> >> In the above git tree, DVFS (dynamic voltage and frequency scaling) mechanism
>> >> >> is applied to the memory bus of Exynos4210 for Exynos4210-NURI boards.
>> >> >> In the example, the LPDDR2 DRAM frequency changes between 133, 266, and 400MHz
>> >> >> and other related clocks simply follow the determined DDR RAM clock.
>> >> >>
>> >> >> The DEVFREQ driver for Exynos4210 memory bus is at
>> >> >> /arch/arm/mach-exynos4/devfreq_bus.c in the git tree.
>> >> >>
>> >> >> MyungJoo Ham (3):
>> >> >>   PM: Introduce DEVFREQ: generic DVFS framework with device-specific
>> >> >>     OPPs
>> >> >>   PM / DEVFREQ: add example governors
>> >> >>   PM / DEVFREQ: add sysfs interface (including user tickling)
>> >> >
>> >> > OK, I'm going to take the patches for 3.2.
>> >>
>> >> Have any other platforms signed up to use this mechanism to manage
>> >> their peripheral DVFS?
>> >
>> > Not that I know of, but one initial user is sufficient for me.
>> > So if you have anything _against_ the patches, please speak up.
>>
>> I do have some concerns.  Let me start by saying that I'm defining a
>> "governor" as some active piece of executing code, probably a looping
>> workqueue that inspects activity/idleness of a device and then makes a
>> determination regarding clock frequency.
>>
>> devfreq seems to be good framework for creating DVFS governors.
>> However I think that most scalable devices on an SoC do *not* need a
>> governor, and many scalable devices won't have performance counters or
>> any other way to implement such introspection.
>
> OK, so I'd like to see what the author of the patch series has to say
> in the face of your comments below.
>
>> Some examples include a MMC controller, which might change its clock
>> rate depending on the class of card that the user has inserted.  Or
>> even a "smartish" device like a GPU lacking performance counters; it's
>> driver will ramp up frequency when there is work to be done and kick
>> off a timeout.  If no new work comes in before the timeout then the
>> driver will drop the frequency.
>>
>> A governor is not required in these cases (as they are event driven)
>> and devfreq is quite heavyweight for such applications.  What is
>> needed is a QoS-style software layer that allows throughput requests
>> to be made from an initiator device towards a target device.  This
>> layer should aggregate requests since many initator devices may make
>> requests to the same target device.  This layer I'm describing, which
>> does not exist today, should be where the actual DVFS transition takes
>> place.  That could take the form of a clk_set_rate call in the clock
>> framework (as described by Colin in V1 of this series), or some other
>> not-yet-realized dvfs_set_opp ,or something like Jean Pihet's
>> per-device PM QoS patches or whatever.  For the purposes of this email
>> I don't really care which framework implements the QoS request
>> aggregation.
>>
>> The point of describing this non-existant API is that devfreq should
>> really be just another input into it.  A governor that can measure bus
>> saturation is really cool, but it may not yield optimal results
>> compared to several drivers which make QoS-style requests and insure
>> that performance is guaranteed for their particular needs during their
>> transactions.  The good news is that we don't have to choose between
>> performance counter introspection and software QoS requests: both the
>> driver requests and the governor should all feed as inputs into the
>> QoS-style DVFS mechanism.
>>
>> Taking that logic to its inevitable conclusion, tickle doesn't belong
>> inside the governor at all.  If some device X wants to ramp up the
>> frequency of device Y, it should just make a QoS-style throughput
>> request towards device Y, possibly with a timeout (keeping the
>> original idea of tickle intact).  This is entirely a separate idea
>> from a governor's introspective workqueue loop.
>>
>> For userspace, a sysfs entry for tickle would also not feed into the
>> governor, but some dummy struct device *user would probably be the
>> initiator device and it would simply call the QoS-style throughput
>> API.
>>
>> In summary my objections to this series are:
>> 1) devfreq should not be the *final* software layer to invoke a DVFS
>> transition as it has not taken all constraints into account.
>> 2) a devfreq governor represents just one constraint out of many to be
>> considered for any given scalable device.
>>
>> My objection to these patches getting merged is that I think they are
>> a bit ahead of their time.
>
> Still, we're merging quite a number of patches being ahead of their time.
> The resulting code is then modified as we learn what's wrong with it or
> how to improve it.
>
> Why exactly do you think this approach will not work in this particular case?

I shouldn't have objected to the patches being merged, because I think
they are OK for a subset of the bigger DVFS problem.  The "governor"
method seems fine for devices with performance counters and whose
drivers do not express performance constraints.

>> We need to know what the real DVFS API looks like underneath devfreq first,
>> since devfreq should really be built on top of it.
>
> Well, quite frankly, if we generally adopted that point of view, much of the
> useful functionality we have in the kernel right now wouldn't be merged at all.

Yes yes, I got ahead of myself; my real goal is to restart the
discussion that popped up in the V1 patchset regarding some API for
handling clock/voltage scaling for DVFS transitions.  However it
doesn't mean that these patches need to be blocked.

I do have an overall concern about the approach mentioned by MyungJoo
where drivers start enabling/disabling OPPs and then the governors
compensate by raising frequency/voltage the next time their workqueues
loop around.  That seems entirely backwards for devices that express
performance needs on an event-driven basis, so my concerns are more
for how these patches *might* be used in the future, and less about
how they look right now.

We still don't have a good way to manage DVFS transitions and
aggregate QoS requests for an SoC.  My hope is when that problem gets
solved devfreq will use those new APIs in its ->target function.  I
also hope that too many drivers won't start using "tickle" as a
substitute for a real DVFS API.

Regards,
Mike

> Think of the USB subsystem, for one example, that has been rewritten from
> scratch no fewer that three times.
>
> The code appears to be reasonably isolated and simple enough to merge.
> There is a subsystem wanting to use it and I don't see anyone forcing
> anybody else to adopt it.  If it's not suitable to you, you won't be using it,
> plain and simple.  And if you come up with some better code to replace it,
> I won't have any problems with taking that too.
>
> Thanks,
> Rafael
>
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm