On Tue, 2023-12-12 at 15:27 +0100, Vincent Guittot wrote: > Provide to the scheduler a feedback about the temporary max available > capacity. Unlike arch_update_thermal_pressure, this doesn't need to be > filtered as the pressure will happen for dozens ms or more. > > Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx> > --- > drivers/cpufreq/cpufreq.c | 48 +++++++++++++++++++++++++++++++++++++++ > include/linux/cpufreq.h | 10 ++++++++ > 2 files changed, 58 insertions(+) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 44db4f59c4cc..7d5f71be8d29 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -2563,6 +2563,50 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu) > } > EXPORT_SYMBOL(cpufreq_get_policy); > > +DEFINE_PER_CPU(unsigned long, cpufreq_pressure); > +EXPORT_PER_CPU_SYMBOL_GPL(cpufreq_pressure); > + > +/** > + * cpufreq_update_pressure() - Update cpufreq pressure for CPUs > + * @cpus : The related CPUs for which max capacity has been reduced > + * @capped_freq : The maximum allowed frequency that CPUs can run at > + * > + * Update the value of cpufreq pressure for all @cpus in the mask. The > + * cpumask should include all (online+offline) affected CPUs, to avoid > + * operating on stale data when hot-plug is used for some CPUs. The > + * @capped_freq reflects the currently allowed max CPUs frequency due to > + * freq_qos capping. It might be also a boost frequency value, which is bigger > + * than the internal 'capacity_freq_ref' max frequency. In such case the > + * pressure value should simply be removed, since this is an indication that > + * there is no capping. The @capped_freq must be provided in kHz. > + */ > +static void cpufreq_update_pressure(const struct cpumask *cpus, > + unsigned long capped_freq) > +{ > + unsigned long max_capacity, capacity, pressure; > + u32 max_freq; > + int cpu; > + > + cpu = cpumask_first(cpus); > + max_capacity = arch_scale_cpu_capacity(cpu); > + max_freq = arch_scale_freq_ref(cpu); > + > + /* > + * Handle properly the boost frequencies, which should simply clean > + * the thermal pressure value. > + */ > + if (max_freq <= capped_freq) > + capacity = max_capacity; > + else > + capacity = mult_frac(max_capacity, capped_freq, max_freq); > + > + pressure = max_capacity - capacity; > + > + > + for_each_cpu(cpu, cpus) > + WRITE_ONCE(per_cpu(cpufreq_pressure, cpu), pressure); Seems like the pressure value computed from the first CPU applies to all CPU. Will this be valid for non-homogeneous CPUs that could have different max_freq and max_capacity? Tim