On 2019-10-23 11:54 AM, Rafael J. Wysocki wrote: > On Wed, Oct 23, 2019 at 4:20 AM Leonard Crestez <leonard.crestez@xxxxxxx> wrote: >> On 2019-10-23 1:48 AM, Rafael J. Wysocki wrote: >>> On Wed, Oct 23, 2019 at 12:06 AM Leonard Crestez >>> <leonard.crestez@xxxxxxx> wrote: >>>> I've been working on a series which add DEV_PM_QOS support to devfreq, >>>> now at v9: >>>> >>>> Your third patch removes DEV_PM_QOS_FREQUENCY_MIN/MAX that my series >>>> depends upon. I found the email on patchwork, hopefully the in-reply-to >>>> header is OK? >>>> >>>> As far as I can tell the replacement ("frequency qos") needs constraints >>>> to be managed outside the device infrastructure and it's not obviously >>>> usable a generic mechanism for making "min_freq/max_freq" requests to a >>>> specific device. >>> >>> You can add a struct freq_constrants pointer to struct dev_pm_info and >>> use it just fine. It doesn't have to be bolted into struct >>> dev_pm_qos. >> >> I'm not sure what you mean by this? min/max_freq was already available >> in dev_pm_qos so it's not clear why it would be moved somewhere else. >> What I'm looking for is a mechanism to make min/max_freq requests on a >> per-device basis and DEV_PM_QOS_MIN_FREQUENCY already did that. >> >> Reuse is good, right? > > But they go away in patch 3 of this series as there are no users in > the tree. Sorry about that. > >>>> I've read a bit through your emails and it seems the problem is that >>>> you're dealing with dev_pm_qos on per-policy basis but each "struct >>>> cpufreq_policy" can cover multiple CPU devices. >>>> >>>> An alternative solution which follows dev_pm_qos would be to add >>>> notifiers for each CPU inside cpufreq_online and cpufreq_offline. This >>>> makes quite a bit of sense because each CPU is a separate "device" with >>>> a possibly distinct list of qos requests. >>> >>> But combining the lists of requests for all the CPUs in a policy >>> defeats the idea of automatic aggregation of requests which really is >>> what PM QoS is about. >> >> My primary interest is the "dev" part of dev_pm_qos: making pm_qos >> requests tied to a specific device. > > The list of requests needs to be associated with the user of the > effective constraint. If that is the device, it is all good. The phrase "user of the effective constraint" is somewhat unclear. I'm using the target device as dev for dev_pm_qos, not the requestor. This is consistent with how it was used for cpufreq: thermal called a dev_pm_qos_add_request on with dev = cpu_dev not a thermal sensor or anything else. However looking at other dev_pm_qos users there are instances of a driver calling dev_pm_qos_add_request on it's own device but this is not a strict requirement, correct? >>> There have to be two lists of requests per policy, one for the max and >>> one for the min frequency > >>>> If cpufreq needs a group of CPUs to run at the same frequency then it >>>> should deal with this by doing dev_pm_qos_read_frequency on each CPU >>>> device and picking a frequency that attempts to satisfy all constraints. >>> >>> No, that would be combining the requests by hand. >> >> It's just a loop though. > > Yes, it is, and needs to be run on every change of an effective > constraint for any CPU even if the total effective constraint doesn't > change. And, of course, the per-policy user space limits would need > to be combined with that by hand. > > Not particularly straightforward if you asked me. Well, this cpu-to-policy aggregation could also use a pm_qos_constraint object instead of looping. > Not to mention the fact that, say, cpu_cooling, has a per-policy list > of requests anyway. > >>>> Handling sysfs min/max_freq through dev_pm_qos would be of dubious >>>> value, though I guess you could register identical requests for each CPU. >>>> >>>> I'm not familiar with what you're trying to accomplish with PM_QOS other >>>> than replace the sysfs min_freq/max_freq files: >>> >>> QoS-based management of the frequency limits is not really needed for >>> that. The real motivation for adding it were things like thermal and >>> platform firmware induced limits that all have their own values to >>> combine with the ones provided by user space. >> >> Current users seem to be thermal-related. Do you care about min/max_freq >> requests from stuff not directly tied to a CPU? > > Yes, I do. > > And they will need to add requests per policy. > >>>> What I want to do is add >>>> a driver using the interconnect driver which translates requests for >>>> "bandwidth-on-a-path" into "frequency-on-a-device". More specifically a >>>> display driver could request bandwidth to RAM and this would be >>>> translated into min frequency for NoC and the DDR controller, both of >>>> which implement scaling via devfreq: >>>> >>>> This is part of an effort to upstream an out-of-tree "busfreq" feature >>>> which allows device device to make "min frequency requests" through an >>>> entirely out-of-tree mechanism. It would also allow finer-grained >>>> scaling that what IMX tree currently support. >>>> >>>> If you're making cpufreq qos constrains be "per-cpufreq-policy" then >>>> it's not clear how you would handle in-kernel constraints from other >>>> subsystems. Would users have to get a pointer to struct cpufreq_policy >>>> and struct freq_constraints? >>> >>> Yes. >>> >>>> That would make object lifetime a nightmare! >>> >>> Why really? It is not much different from the device PM QoS case >> >> Actually, is a simple >>> one-for-one replacement of the former. As it turns out, all of its >>> users have access to a policy object anyway already. >> >> All current users are very closely tied to cpufreq, what I had in mind >> is requests from unrelated subsystems. > > You can use cpufreq policy notifiers for that. Add a request for each > CPU in the policy (or for each related CPU if that is needed) to > policy->constraints on CREATE_POLICY and remove them on REMOVE_POLICY. > That's all you need to do. > > BTW, the original code from Viresh did that already, I haven't changed > it. And it didn't have per-CPU lists of frequency requests for that > matter, it used the ones in policy->cpu as the per-policy lists, which > doesn't work. > >> Browsing through the cpufreq core it seems that it's possible for a >> struct cpufreq_policy to be created and destroyed at various points, the >> simplest example being rmmod/modprobe on a cpufreq driver. >> >> The freq_qos_add_request function grabs a pointer to struct >> freq_constraints, this can become invalid when cpufreq_policy is freed. >> >> I guess all users need to register a CPUFREQ_POLICY_NOTIFIER and make >> sure to freq_qos_add_request every time? > > Yes. > > The policy is the user of the effective constraint anyway and holding > on to a list of requests without a user of the effective constraint > would be, well, not useful. > >> Looking at your [PATCH 2/3] I can't spot any obvious issue, thermal clamping >> code seems to get the appropriate callbacks. >> >>>> But dev_pm_qos solves this by tying to struct device. >> >> The lifetime of "struct device" is already controlled by >> get_device/put_device. > > And why does this matter here? My point is that dev_pm_qos is easier for consumers to use than dealing with cpufreq_policy lifetime and has less exposure to cpufreq implementation details. But all current consumers seem to be appropriately coupled into cpufreq. >>> Well, the cpufreq sysfs is per-policy and not per-CPU and we really >>> need a per-policy min and max frequency in cpufreq, for governors etc. >> >> Aggregation could be performed at two levels: >> >> 1) Per cpu device (by dev_pm_qos) >> 2) Per policy (inside cpufreq) >> >> The per-cpu dev_pm_qos notifier would just update a per-policy >> pm_qos_constraints object. The second step could even be done strictly >> inside the cpufreq core using existing pm_qos, no need to invent new >> frameworks. >> >> Maybe dev_pm_qos is not a very good fit for cpufreq because of these >> "cpu device versus cpufreq_policy" issues but it makes a ton of sense >> for devfreq. Can you maybe hold PATCH 3 from this series pending further >> discussion? > > It can be reverted at any time if need be and in 5.4 that would be dead code. I guess I can post v10 of my "devfreq pm qos" which starts by reverting "PATCH 3" of this series? -- Regards, Leonard