On 25.10.2019 00:11, Rafael J. Wysocki wrote: > On Thu, Oct 24, 2019 at 7:47 PM Leonard Crestez <leonard.crestez@xxxxxxx> wrote: >> >> On 24.10.2019 16:42, Rafael J. Wysocki wrote: >>> On Wed, Oct 23, 2019 at 3:33 PM Leonard Crestez <leonard.crestez@xxxxxxx> wrote: >>>> >>>> On 2019-10-23 11:54 AM, Rafael J. Wysocki wrote: >>>>> On Wed, Oct 23, 2019 at 4:20 AM Leonard Crestez <leonard.crestez@xxxxxxx> wrote: >>>>>> On 2019-10-23 1:48 AM, Rafael J. Wysocki wrote: >>> >>> [cut] >>> >>>>>>> But combining the lists of requests for all the CPUs in a policy >>>>>>> defeats the idea of automatic aggregation of requests which really is >>>>>>> what PM QoS is about. >>>>>> >>>>>> My primary interest is the "dev" part of dev_pm_qos: making pm_qos >>>>>> requests tied to a specific device. >>>>> >>>>> The list of requests needs to be associated with the user of the >>>>> effective constraint. If that is the device, it is all good. >>>> >>>> The phrase "user of the effective constraint" is somewhat unclear. >>> >>> Fair enough, so let me elaborate. >>> >>> The effective constraint (ie. the one resulting from taking all of the >>> requests in the relevant QoS list into account) affects the selection >>> of an OPP, so it is natural to associate the QoS list producing it >>> with a list of OPPs to select. In the cpufreq case, the policy holds >>> the list of OPPs and so it also should hold the corresponding QoS >>> lists (for the min and max frequency limits). It "uses" the effective >>> constraints produced by those QoS lists by preventing the OPPs out of >>> the between the min and max values from being selected. >>> >>> Essentially, the policy represents a power (clock/voltage) domain with >>> multiple components (it doesn't matter what they are at this level of >>> abstraction). While there can be multiple sources of QoS requests >>> associated with each component, all of these requests ultimately need >>> to be passed to the domain for aggregation, because that's where the >>> frequency selection decisions are made and so that's where the >>> effective constraint value needs to be known. Now, the natural way to >>> allow requests from multiple sources to be passed for aggregation is >>> to provide a QoS list that they can be added to. That really is what >>> PM QoS is for. >>> >>>> I'm using the target device as dev for dev_pm_qos, not the requestor. >>>> This is consistent with how it was used for cpufreq: thermal called a >>>> dev_pm_qos_add_request on with dev = cpu_dev not a thermal sensor or >>>> anything else. >>> >>> Not really, but close. :-) >>> >>> Without my series (that is 5.4-rc4, say), the cpu_cooling driver adds >>> its constraint to the device PM QoS of cpufreq_cdev which is a special >>> device created by that driver. That would be fine, except that the >>> cpufreq core doesn't use that QoS. It uses the device PM QoS of the >>> policy->cpu device instead. That is, that's where it adds its >>> notifiers (see cpufreq_policy_alloc()), that's where user space >>> requests are added (see cpufreq_online()), and (most important) that's >>> where the effective constraint value is read from (see >>> cpufreq_set_policy()). That turns out to be problematic (in addition >>> to the cpu_cooling driver's QoS requests going nowhere), because >>> confusion ensues if the current policy->cpu goes away. >> >> That behavior in cpu_cooling seems like a bug. > > Well, kind of. :-) > >>>> However looking at other dev_pm_qos users there are instances of a >>>> driver calling dev_pm_qos_add_request on it's own device but this is not >>>> a strict requirement, correct? >>> >>> No, it isn't. >>> >>>>>>> There have to be two lists of requests per policy, one for the max and >>>>>>> one for the min frequency > >>>>>>>> If cpufreq needs a group of CPUs to run at the same frequency then it >>>>>>>> should deal with this by doing dev_pm_qos_read_frequency on each CPU >>>>>>>> device and picking a frequency that attempts to satisfy all constraints. >>>>>>> >>>>>>> No, that would be combining the requests by hand. >>>>>> >>>>>> It's just a loop though. >>>>> >>>>> Yes, it is, and needs to be run on every change of an effective >>>>> constraint for any CPU even if the total effective constraint doesn't >>>>> change. And, of course, the per-policy user space limits would need >>>>> to be combined with that by hand. >>>>> >>>>> Not particularly straightforward if you asked me. >>>> >>>> Well, this cpu-to-policy aggregation could also use a pm_qos_constraint >>>> object instead of looping. >>> >>> Yes, it could, but then somebody would need to add those >>> "intermediate" requests to a proper policy-level QoS and it would need >>> an extra notifier invocation to update each of them on a "component" >>> QoS change. >>> >>> This is an interesting idea in case we ever need to improve the >>> scalability of the QoS lists, but I'd rather use the simpler approach >>> for now. >> >> The advantage I see is reducing the exposure of cpufreq internals > > That can be achieved by providing a helper to add a frequency QoS > request to the min or max QoS list of the policy covering a given CPU. > The caller of it would just need to pass the CPU number, a pointer to > the request struct and the type. > > It wasn't necessary to add it at this time, though, and there would be > the extra complication that the caller would need to know whether or > not the policy had been created already. Using dev_pm_qos already provides that, and since the request is tied to the struct device of the CPU there is no need to know anything about cpufreq_policy at all. It would just need an additional layer of aggregation from CPU to cpufreq_policy. >>>>>>> Well, the cpufreq sysfs is per-policy and not per-CPU and we really >>>>>>> need a per-policy min and max frequency in cpufreq, for governors etc. >>>>>> >>>>>> Aggregation could be performed at two levels: >>>>>> >>>>>> 1) Per cpu device (by dev_pm_qos) >>>>>> 2) Per policy (inside cpufreq) >>>>>> >>>>>> The per-cpu dev_pm_qos notifier would just update a per-policy >>>>>> pm_qos_constraints object. The second step could even be done strictly >>>>>> inside the cpufreq core using existing pm_qos, no need to invent new >>>>>> frameworks. >>>>>> >>>>>> Maybe dev_pm_qos is not a very good fit for cpufreq because of these >>>>>> "cpu device versus cpufreq_policy" issues but it makes a ton of sense >>>>>> for devfreq. Can you maybe hold PATCH 3 from this series pending further >>>>>> discussion? >>>>> >>>>> It can be reverted at any time if need be and in 5.4 that would be dead code. >>>> >>>> I guess I can post v10 of my "devfreq pm qos" which starts by reverting >>>> "PATCH 3" of this series? >>> >>> You may do that, but I would consider adding a struct freq_constraints >>> pointer directly to struct dev_pm_info and using the new frequency QoS >>> helpers to manage it. >>> >>> Arguably, there is no need to bundle that with the rest of device PM >>> QoS and doing the above would help to avoid some code duplication too. >> >> Adding to struct dev_pm_info would increase sizeof(struct device) while >> dev_pm_qos only allocates memory when constraints are added. My >> expectation is that very few devices would even have min_freq and >> max_freq constraints. > > Well, fair enough. > >> Maybe struct dev_pm_qos could host a "struct freq_constraints freq" >> instead of two separate "struct pm_qos_constraints min/max_frequency"? > > That is possible too. > >> This way there would be two users of freq_constraints: cpufreq_policy >> (which is not a device) and dev_pm_qos. >> >> In the future freq_constraints might be extended to implement some logic >> for conflicts between min_freq and max_freq requests. > > Sure. Posted for review: https://patchwork.kernel.org/cover/11212887/