Re: [RFT][PATCH 0/3] cpufreq / PM: QoS: Introduce frequency QoS and use it in cpufreq

"Rafael J. Wysocki" <rafael@xxxxxxxxxx> · Thu, 24 Oct 2019 23:10:53 +0200

On Thu, Oct 24, 2019 at 7:47 PM Leonard Crestez <leonard.crestez@xxxxxxx> wrote:
>
> On 24.10.2019 16:42, Rafael J. Wysocki wrote:
> > On Wed, Oct 23, 2019 at 3:33 PM Leonard Crestez <leonard.crestez@xxxxxxx> wrote:
> >>
> >> On 2019-10-23 11:54 AM, Rafael J. Wysocki wrote:
> >>> On Wed, Oct 23, 2019 at 4:20 AM Leonard Crestez <leonard.crestez@xxxxxxx> wrote:
> >>>> On 2019-10-23 1:48 AM, Rafael J. Wysocki wrote:
> >
> > [cut]
> >
> >>>>> But combining the lists of requests for all the CPUs in a policy
> >>>>> defeats the idea of automatic aggregation of requests which really is
> >>>>> what PM QoS is about.
> >>>>
> >>>> My primary interest is the "dev" part of dev_pm_qos: making pm_qos
> >>>> requests tied to a specific device.
> >>>
> >>> The list of requests needs to be associated with the user of the
> >>> effective constraint.  If that is the device, it is all good.
> >>
> >> The phrase "user of the effective constraint" is somewhat unclear.
> >
> > Fair enough, so let me elaborate.
> >
> > The effective constraint (ie. the one resulting from taking all of the
> > requests in the relevant QoS list into account) affects the selection
> > of an OPP, so it is natural to associate the QoS list producing it
> > with a list of OPPs to select.  In the cpufreq case, the policy holds
> > the list of OPPs and so it also should hold the corresponding QoS
> > lists (for the min and max frequency limits).  It "uses" the effective
> > constraints produced by those QoS lists by preventing the OPPs out of
> > the between the min and max values  from being selected.
> >
> > Essentially, the policy represents a power (clock/voltage) domain with
> > multiple components (it doesn't matter what they are at this level of
> > abstraction). While there can be multiple sources of QoS requests
> > associated with each component, all of these requests ultimately need
> > to be passed to the domain for aggregation, because that's where the
> > frequency selection decisions are made and so that's where the
> > effective constraint value needs to be known. Now, the natural way to
> > allow requests from multiple sources to be passed for aggregation is
> > to provide a QoS list that they can be added to. That really is what
> > PM QoS is for.
> >
> >> I'm using the target device as dev for dev_pm_qos, not the requestor.
> >> This is consistent with how it was used for cpufreq: thermal called a
> >> dev_pm_qos_add_request on with dev = cpu_dev not a thermal sensor or
> >> anything else.
> >
> > Not really, but close. :-)
> >
> > Without my series (that is 5.4-rc4, say), the cpu_cooling driver adds
> > its constraint to the device PM QoS of cpufreq_cdev which is a special
> > device created by that driver.  That would be fine, except that the
> > cpufreq core doesn't use that QoS.  It uses the device PM QoS of the
> > policy->cpu device instead.  That is, that's where it adds its
> > notifiers (see cpufreq_policy_alloc()), that's where user space
> > requests are added (see cpufreq_online()), and (most important) that's
> > where the effective constraint value is read from (see
> > cpufreq_set_policy()).  That turns out to be problematic (in addition
> > to the cpu_cooling driver's QoS requests going nowhere), because
> > confusion ensues if the current policy->cpu goes away.
>
> That behavior in cpu_cooling seems like a bug.

Well, kind of. :-)

> >> However looking at other dev_pm_qos users there are instances of a
> >> driver calling dev_pm_qos_add_request on it's own device but this is not
> >> a strict requirement, correct?
> >
> > No, it isn't.
> >
> >>>>> There have to be two lists of requests per policy, one for the max and
> >>>>> one for the min frequency >
> >>>>>> If cpufreq needs a group of CPUs to run at the same frequency then it
> >>>>>> should deal with this by doing dev_pm_qos_read_frequency on each CPU
> >>>>>> device and picking a frequency that attempts to satisfy all constraints.
> >>>>>
> >>>>> No, that would be combining the requests by hand.
> >>>>
> >>>> It's just a loop though.
> >>>
> >>> Yes, it is, and needs to be run on every change of an effective
> >>> constraint for any CPU even if the total effective constraint doesn't
> >>> change.  And, of course, the per-policy user space limits would need
> >>> to be combined with that by hand.
> >>>
> >>> Not particularly straightforward if you asked me.
> >>
> >> Well, this cpu-to-policy aggregation could also use a pm_qos_constraint
> >> object instead of looping.
> >
> > Yes, it could, but then somebody would need to add those
> > "intermediate" requests to a proper policy-level QoS and it would need
> > an extra notifier invocation to update each of them on a "component"
> > QoS change.
> >
> > This is an interesting idea in case we ever need to improve the
> > scalability of the QoS lists, but I'd rather use the simpler approach
> > for now.
>
> The advantage I see is reducing the exposure of cpufreq internals

That can be achieved by providing a helper to add a frequency QoS
request to the min or max QoS list of the policy covering a given CPU.
The caller of it would just need to pass the CPU number, a pointer to
the request struct and the type.

It wasn't necessary to add it at this time, though, and there would be
the extra complication that the caller would need to know whether or
not the policy had been created already.

> >>>>> Well, the cpufreq sysfs is per-policy and not per-CPU and we really
> >>>>> need a per-policy min and max frequency in cpufreq, for governors etc.
> >>>>
> >>>> Aggregation could be performed at two levels:
> >>>>
> >>>> 1) Per cpu device (by dev_pm_qos)
> >>>> 2) Per policy (inside cpufreq)
> >>>>
> >>>> The per-cpu dev_pm_qos notifier would just update a per-policy
> >>>> pm_qos_constraints object. The second step could even be done strictly
> >>>> inside the cpufreq core using existing pm_qos, no need to invent new
> >>>> frameworks.
> >>>>
> >>>> Maybe dev_pm_qos is not a very good fit for cpufreq because of these
> >>>> "cpu device versus cpufreq_policy" issues but it makes a ton of sense
> >>>> for devfreq. Can you maybe hold PATCH 3 from this series pending further
> >>>> discussion?
> >>>
> >>> It can be reverted at any time if need be and in 5.4 that would be dead code.
> >>
> >> I guess I can post v10 of my "devfreq pm qos" which starts by reverting
> >> "PATCH 3" of this series?
> >
> > You may do that, but I would consider adding a struct freq_constraints
> > pointer directly to struct dev_pm_info and using the new frequency QoS
> > helpers to manage it.
> >
> > Arguably, there is no need to bundle that with the rest of device PM
> > QoS and doing the above would help to avoid some code duplication too.
>
> Adding to struct dev_pm_info would increase sizeof(struct device) while
> dev_pm_qos only allocates memory when constraints are added. My
> expectation is that very few devices would even have min_freq and
> max_freq constraints.

Well, fair enough.

> Maybe struct dev_pm_qos could host a "struct freq_constraints freq"
> instead of two separate "struct pm_qos_constraints min/max_frequency"?

That is possible too.

> This way there would be two users of freq_constraints: cpufreq_policy
> (which is not a device) and dev_pm_qos.
>
> In the future freq_constraints might be extended to implement some logic
> for conflicts between min_freq and max_freq requests.

Sure.

Thanks!