Re: [RFT][PATCH 0/3] cpufreq / PM: QoS: Introduce frequency QoS and use it in cpufreq

"Rafael J. Wysocki" <rafael@xxxxxxxxxx> · Thu, 24 Oct 2019 15:42:09 +0200

On Wed, Oct 23, 2019 at 3:33 PM Leonard Crestez <leonard.crestez@xxxxxxx> wrote:
>
> On 2019-10-23 11:54 AM, Rafael J. Wysocki wrote:
> > On Wed, Oct 23, 2019 at 4:20 AM Leonard Crestez <leonard.crestez@xxxxxxx> wrote:
> >> On 2019-10-23 1:48 AM, Rafael J. Wysocki wrote:

[cut]

> >>> But combining the lists of requests for all the CPUs in a policy
> >>> defeats the idea of automatic aggregation of requests which really is
> >>> what PM QoS is about.
> >>
> >> My primary interest is the "dev" part of dev_pm_qos: making pm_qos
> >> requests tied to a specific device.
> >
> > The list of requests needs to be associated with the user of the
> > effective constraint.  If that is the device, it is all good.
>
> The phrase "user of the effective constraint" is somewhat unclear.

Fair enough, so let me elaborate.

The effective constraint (ie. the one resulting from taking all of the
requests in the relevant QoS list into account) affects the selection
of an OPP, so it is natural to associate the QoS list producing it
with a list of OPPs to select.  In the cpufreq case, the policy holds
the list of OPPs and so it also should hold the corresponding QoS
lists (for the min and max frequency limits).  It "uses" the effective
constraints produced by those QoS lists by preventing the OPPs out of
the between the min and max values  from being selected.

Essentially, the policy represents a power (clock/voltage) domain with
multiple components (it doesn't matter what they are at this level of
abstraction). While there can be multiple sources of QoS requests
associated with each component, all of these requests ultimately need
to be passed to the domain for aggregation, because that's where the
frequency selection decisions are made and so that's where the
effective constraint value needs to be known. Now, the natural way to
allow requests from multiple sources to be passed for aggregation is
to provide a QoS list that they can be added to. That really is what
PM QoS is for.

> I'm using the target device as dev for dev_pm_qos, not the requestor.
> This is consistent with how it was used for cpufreq: thermal called a
> dev_pm_qos_add_request on with dev = cpu_dev not a thermal sensor or
> anything else.

Not really, but close. :-)

Without my series (that is 5.4-rc4, say), the cpu_cooling driver adds
its constraint to the device PM QoS of cpufreq_cdev which is a special
device created by that driver.  That would be fine, except that the
cpufreq core doesn't use that QoS.  It uses the device PM QoS of the
policy->cpu device instead.  That is, that's where it adds its
notifiers (see cpufreq_policy_alloc()), that's where user space
requests are added (see cpufreq_online()), and (most important) that's
where the effective constraint value is read from (see
cpufreq_set_policy()).  That turns out to be problematic (in addition
to the cpu_cooling driver's QoS requests going nowhere), because
confusion ensues if the current policy->cpu goes away.

> However looking at other dev_pm_qos users there are instances of a
> driver calling dev_pm_qos_add_request on it's own device but this is not
> a strict requirement, correct?

No, it isn't.

> >>> There have to be two lists of requests per policy, one for the max and
> >>> one for the min frequency >
> >>>> If cpufreq needs a group of CPUs to run at the same frequency then it
> >>>> should deal with this by doing dev_pm_qos_read_frequency on each CPU
> >>>> device and picking a frequency that attempts to satisfy all constraints.
> >>>
> >>> No, that would be combining the requests by hand.
> >>
> >> It's just a loop though.
> >
> > Yes, it is, and needs to be run on every change of an effective
> > constraint for any CPU even if the total effective constraint doesn't
> > change.  And, of course, the per-policy user space limits would need
> > to be combined with that by hand.
> >
> > Not particularly straightforward if you asked me.
>
> Well, this cpu-to-policy aggregation could also use a pm_qos_constraint
> object instead of looping.

Yes, it could, but then somebody would need to add those
"intermediate" requests to a proper policy-level QoS and it would need
an extra notifier invocation to update each of them on a "component"
QoS change.

This is an interesting idea in case we ever need to improve the
scalability of the QoS lists, but I'd rather use the simpler approach
for now.

[cut]

> >>> Well, the cpufreq sysfs is per-policy and not per-CPU and we really
> >>> need a per-policy min and max frequency in cpufreq, for governors etc.
> >>
> >> Aggregation could be performed at two levels:
> >>
> >> 1) Per cpu device (by dev_pm_qos)
> >> 2) Per policy (inside cpufreq)
> >>
> >> The per-cpu dev_pm_qos notifier would just update a per-policy
> >> pm_qos_constraints object. The second step could even be done strictly
> >> inside the cpufreq core using existing pm_qos, no need to invent new
> >> frameworks.
> >>
> >> Maybe dev_pm_qos is not a very good fit for cpufreq because of these
> >> "cpu device versus cpufreq_policy" issues but it makes a ton of sense
> >> for devfreq. Can you maybe hold PATCH 3 from this series pending further
> >> discussion?
> >
> > It can be reverted at any time if need be and in 5.4 that would be dead code.
>
> I guess I can post v10 of my "devfreq pm qos" which starts by reverting
> "PATCH 3" of this series?

You may do that, but I would consider adding a struct freq_constraints
pointer directly to struct dev_pm_info and using the new frequency QoS
helpers to manage it.

Arguably, there is no need to bundle that with the rest of device PM
QoS and doing the above would help to avoid some code duplication too.

Thanks!