Re: [PATCH v5 2/3] PM / DEVFREQ: add basic governors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 17, 2011 at 3:11 AM, Turquette, Mike <mturquette@xxxxxx> wrote:
> On Tue, Aug 16, 2011 at 1:52 AM, MyungJoo Ham <myungjoo.ham@xxxxxxxxxxx> wrote:
>>
>> First, in case where we want to let each DVFS-capable device have
>> exact polling frequency (up to jiffy resolution), we only need to set
>> polling_interval = jiffies_to_msecs(1);.
>
> That requires a source code change for anyone that wants to do it.  My
> main complaint with this method is that it is restrictive to begin
> with and the whole method for determining the next_polling time
> reproduces what workqueues already give us.
>
>> In case of CPUFREQ, there would be only one polling loop at most for
>> each core. However, in case of DEVFREQ, there could be multiple
>> polling loops at a core if CPUFREQ-like looping logic is introduced.
>> Why don't we reduce that overhead while their function is same, it is
>> easily doable, and it reduces redundancy?
>
> I'm afraid I don't follow.  I was thinking of having a single wq loop
> for each device.  Under what conditions would a single device have
> multiple wq loops operating against it?

I meant a single wq loop for each device and multiple DEVFREQ devices
for a system.

I aggregated multiple instances of DEVFREQ polling into one polling loop because
1. remove redundant polling loops,
2. simplify governors implementation; I'm presuming that some devices
(especially GPUs and MMC hosts) might want their own custom governors
although the governors won't be complex, and
3. reduce overhead.

I don't see any benefit of looping at each DEVFREQ device, yet.

The case of CPUFREQ is a bit different because each CPUFREQ device
(CPU) is capable of looping itself and each looping should represent
one CPU. Each looping device (CPU) will have only up to one CPUFREQ
polling loop and each instance of CPUFREQ should be executed on the
corresponding CPU in order to avoid being slept while the represented
CPU is running.

>
>>>> When a user updates OPP entries (enable/disable/add), OPP framework
>>>> automatically notifies DEVFREQ to update operating frequency
>>>> accordingly. Thus, DEVFREQ users (device drivers) do not need to update
>>>
>>> It would be nice if OPP library "notified" devfreq but it does not
>>> today.  OPP library needs notifiers and devfreq can provide handlers
>>> for them.
>>
>> That's why devfreq_update() is added in the patch. While DEVFREQ is
>> the only one requiring notifications from OPP, do you think we may
>> incur the overhead of notifier at OPP by replacing devfreq_update with
>> notifier? If we somehow add another module that requires notifications
>> from OPP for frequency availability changes, we will need to implement
>> notifier at OPP side, but not just yet, I guess: (discussed before at
>> https://lists.linux-foundation.org/pipermail/linux-pm/2011-July/032053.html
>> )
>
> Reading that thread makes me think that we really should implement
> notifiers in the OPP library.  An obvious user of OPP notifiers would
> be CPUfreq.  I think it is safe to say that there may be
> implementations of devfreq and CPUfreq that live side-by-side in the
> near future; OPPs might be enabled/disabled dynamically, which means
> both of them need callbacks.  Better to abstract it out early, I
> think.

Ok, assuming that there would be another requesting notifications from
OPP, I've added "opp_get_notifier()" that returns struct
srcu_notifier_head * removing devfreq_update() at patchset v6
candidate.

>
>>>
>> []
>>>> ---
>>>>  drivers/base/power/devfreq.c |  100 ++++++++++++++++++++++++++++++++++++++++++
>>>>  include/linux/devfreq.h      |    8 +++
>>>>  2 files changed, 108 insertions(+), 0 deletions(-)
>>>
>>> Governors should be split out into their own file, especially since
>>> they need to grow to include polling/queueing logic.
>>
>> We will need to decide where to settle devfreq core, drivers, and
>> governors first. Would /drivers/devfreq/ be appropriate?
>
> I think GKH already ACK'd drivers/devfreq in a previous thread:
> https://lists.linux-foundation.org/pipermail/linux-pm/2011-August/032537.html

Yup. I'm moving them to /drivers/devfreq/

>
>> []
>>>> +
>>>> +       /* Set MAX if it's busy enough */
>>>> +       if (stat.busy_time * 100 >
>>>> +           stat.total_time * DFSO_UPTHRESHOLD) {
>>>
>>> Thresholds should not be constants, but should be tuneable parameters,
>>> per-device.  This is yet another reason for revising the existing
>>> relationship between devfreq core code, governors and devices.
>>>
>>
>> I agree. I think I should add governor specific "setup" value at
>> devfreq_add_device(); modifying the interface from
>>
>> extern int devfreq_add_device(struct device *dev, struct
>> devfreq_dev_profile *profile, struct devfreq_governor *governor);
>> ==>
>> extern int devfreq_add_device(struct device *dev, struct
>> devfreq_dev_profile *profile, struct devfreq_governor *governor, void
>> *gov_data);
>>
>> where gov_data is fed to struct devfreq's data field.
>
> It would be nice for the threshold values to be run-time tunable via
> sysfs.  CPUfreq does this well today for ondemand/conservative
> governors and it really helps when doing power/performance tuning.
>
> Regards,
> Mike
>

This will be done with the next revision.

Thank you.

MyungJoo

>>>> +               *freq = UINT_MAX;
>>>> +               return 0;
>>>> +       }
>>>> +
>>>> +       /* Set MAX if we do not know the initial frequency */
>>>> +       if (stat.current_frequency == 0) {
>>>> +               *freq = UINT_MAX;
>>>> +               return 0;
>>>> +       }
>>>> +
>>>> +       /* Keep the current frequency */
>>>> +       if (stat.busy_time * 100 >
>>>> +           stat.total_time * (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL)) {
>>>
>>> Same as above.
>> Yes.
>>
>>>
>>>> +               *freq = stat.current_frequency;
>>>> +               return 0;
>>>> +       }
>>>> +
>>>> +       /* Set the desired frequency based on the load */
>>>> +       a = stat.busy_time;
>>>> +       a *= stat.current_frequency;
>>>> +       b = div_u64(a, stat.total_time);
>>>> +       b *= 100;
>>>> +       b = div_u64(b, (DFSO_UPTHRESHOLD - DFSO_DOWNDIFFERENCTIAL / 2));
>>>
>>> Same as above.
>> Yes.
>>
>>>
>>> Regards,
>>> Mike
>>>
>>
>>
>> Cheers!
>> MyungJoo
>>
>> --
>> MyungJoo Ham (함명주), Ph.D.
>> Mobile Software Platform Lab,
>> Digital Media and Communications (DMC) Business
>> Samsung Electronics
>> cell: 82-10-6714-2858
>>
>



-- 
MyungJoo Ham, Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm



[Index of Archives]     [Linux ACPI]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [CPU Freq]     [Kernel Newbies]     [Fedora Kernel]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux