Re: [PATCH v5 1/3] PM: Introduce DEVFREQ: generic DVFS framework with device-specific OPPs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 11, 2011 at 10:00 AM, Turquette, Mike <mturquette@xxxxxx> wrote:
>> +static unsigned int devfreq_interval = 20;
>
> Instead of devfreq_interval, how about devices specify their
> devfreq->profile->polling_ms which gets turned into a jiffies value
> called devfreq->profile->next_monitor?  devfreq->profile->next_monitor
> is a jiffies value some time in the future.
>
> When the monitor is run then devfreq->profile->next_monitor will
> simply be "jiffies + msecs_to_jiffies(devfreq->profile->polling_ms)".
> That would remove the unnecessary interval which is pretty restrictive
> for someone that *really* wants a 50ms polling interval (not a
> multiple of 20).
>
> Then something like the following can be done in devfreq_monitor:
>
>        unsigned long next_monitor = UINT_MAX;
>        for_each_entry(devfreq, &devfreq_list, node) {
>                next_monitor = minimum(next_monitor, devfreq->next_monitor);
>        }
>        queue_delayed_work(devfreq_wq, &devfreq_work, next_monitor);
>

Thanks. In the patchset v6, I'll let it based on simply JIFFY. In the
v6 rc, I've removed devfreq_interval.

>> +
>> +/*
>> + * devfreq_work periodically (given by devfreq_interval) monitors every
>> + * registered device.
>> + */
>> +static bool polling;
>
> Since patchset v5 removes tickle (so as not to duplicate a QoS
> constraint), devfreq should no longer handle any non-polling DVFS
> transitions, which can also be expressed through a QoS constraint.
> Any static constraint requests from other drivers will use the QoS API
> and not devfreq.  As such "polling" can be removed and all the code
> that touches it since polling is assumed.  Why use devfreq if you're
> not going to poll?
>
> This also makes the performance and powersave governors redundant, and
> possibly the userspace governor as well since QoS requests can just as
> easily achieve the sort of constraints desirable for a non-polling
> implementation.
>

I guess this is resolved in the other thread regarding patch 2/3.

> This also touches on another point that I'll elaborate more on below,
> which is that timer queueing/scheduling should be implemented by the
> governors, not the core code.
[]
> Polling should not be implemented in core code.  Instead the devfreq
> devices should define their own delayed_work and governors should
> schedule that work, possibly with governor-specific work queues, or
> using the kernel global workqueue.  E.g: simple-ondemand should have
> its own workqueue and each device have it's own delayed_work;
> something like devfreq->governor->wq and devfreq->work would suffice.
>
> This is better aligned with CPUfreq, which leaves polling details to
> the governors.  For example cpufreq-ondemand has delayed_work for each
> CPU and that governor schedules the work, not core code.  Core code
> handles little more than registration and sysfs events (including
> switching governors).

The first paragraphs in the reply of the thread of patch 2/3 discusses
this issue: in brief,
unlike CPUFREQ, allowing governors to loop itself increases overhead
and duplicated code.

>
> Some other benefits are increased debugging capability (we know which
> governor failed, we know which device failed, etc) by separating the
> work queues and the work structs and it simplifies having to figure
> out some min_polling/next_polling since workqueues do that for us when
> we schedule multiple pieces of work, all with different delays.

At an error in DEVFREQ polling process, the corresponding device's
DEVFREQ is removed from polling and an error message is printed. Thus,
the user should be able to know which device has failed with what
errno already. The governor is designated by DEVFREQ user, so the user
should know which governor is failing by knowing which device is
failing. Or, we may simply add governor name at the error message in
devfreq_monitor().

>
>> +static int devfreq_do(struct devfreq *devfreq)
>> +{
>> +       struct opp *opp;
>> +       unsigned long freq;
>> +       int err;
>> +
>> +       err = devfreq->governor->get_target_freq(devfreq, &freq);
>
> Following along with my comments above, the semantics of calling
> get_target_freq in devfreq_do are backwards.  This code should go in
> the governor and is analogous to the various dbs_check_cpu functions
> in those governors.

By limiting the governors' role as recommending appropriate
frequencies only, we reduce redundant code across governors and
simplifies implementation of governors. As long as we might need to
implement custom governors for different type of devices (especially
for GPUs and MMCs), this can ease the effort later. I don't think we
need to create redundant code here across governors at least at this
stage.

>
>> +       if (err)
>> +               return err;
>> +
>> +       opp = opp_find_freq_ceil(devfreq->dev, &freq);
>
> I know that devfreq was originally developed with OPP as a
> requirement, but there is no reason to include such details in
> devfreq.  Many platforms may not adopt the OPP libary.
>
> How about the governor calls devfreq->profile->get_target_freq in it's
> decision making function (e.g. dbs_check_cpu) and then calls
> devfreq->profile->target with the frequency from that same function,
> only this time passing in the frequency instead of an OPP?
>
> That target function can determine what to do with the frequency.  In
> the case of Exoyns4 the target function can use the OPP library.  In
> the case of a simpler device it might just make a clk_set_rate call.
> This also removes the drama from V1 of this patchset where the use of
> OPP library itself was disputed and allows devfreq to be more generic.
>

A device that is capable of DVFS has multiple pairs of frequency and
voltage. OPP is a data structure to represent multiple pairs of
frequency and voltage of a device. Without OPP, DEVFREQ requires to
implement data structure to represent multiple pairs of frequency and
voltage per device. Why would we implement another data structure that
represents the same thing? Besides, for devices already with OPPs,
their device driver needs to make two copies of data with different
type representing the exactly same things.

Actually, later on (seems not just yet anyway), OPP may need to loosen
up the condition in Kconfig in case non-SoC-dependent devices starting
to use DVFS feature, or even, CPUFREQ may use OPP as its data
structure for listing available frequencies.

>> +               next_poll_min = (next_poll_min > devfreq->next_polling) ?
>> +                               devfreq->next_polling : next_poll_min;
>> +       }
>
> Again, I find this whole method of using devfreq_interval and
> evaluating every single device in a single delayed_work in a single
> workqueue unintuitive.  workqueues already do the job of taking many
> pieces of future work and ordering them based on their delay, so why
> should we replicate that pattern here by grouping all work together
> and manually determining when to run the monitor?


>
>> +       struct devfreq *new_devfreq, *devfreq;
>
> Why use new_devfreq at all?  devfreq can be reused after checking for
> error conditions.

Thanks. I will remove new_devfreq.

>
>> +       if (!IS_ERR(devfreq)) {
>> +               dev_err(dev, "%s: Unable to create devfreq for the device. "
>> +                       "It already has one.\n", __func__);
>
> Put the error string all on one line, even if it goes past 80 chars.

will correct that one.

>> +int devfreq_update(struct device *dev)
>
> OPP library should implement notifiers which devfreq can subscribe to,
> instead of hacking this code into the OPP libary.  I've looped in
> Nishanth Menon, author of OPP library, to comment.
>
>> +       devfreq_update(dev);
>
> Same comment as above.
>
>> +       devfreq_update(dev);
>
> Same comment as above.
>

The next revision won't use devfreq_update. Thanks.


Cheers!
MyungJoo

-- 
MyungJoo Ham (함명주), Ph.D.
Mobile Software Platform Lab,
Digital Media and Communications (DMC) Business
Samsung Electronics
cell: 82-10-6714-2858
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm



[Index of Archives]     [Linux ACPI]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [CPU Freq]     [Kernel Newbies]     [Fedora Kernel]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux