On Tue 18 Jan 22:40 PST 2022, Viresh Kumar wrote: > On 19-01-22, 12:05, Viresh Kumar wrote: > > policy->cpus keeps on changing with CPU hotplug and this can leave > > your platform in an inconsistent state. For example, in case where you > > offline a CPU from policy, other CPUs get their thermal pressure > > updated, online the CPU back and all CPUs of a policy don't have the > > same settings anymore. > > Oh, I didn't know that. Then my proposal doesn't seem that awesome. > > There are few things we can do here now: > > > > - Check for empty related_cpus and return early. Since related_cpus is > > updated only once, this shall work just fine and must not be racy. > > > > While at it, I think we can also do something like this in > > topology_update_thermal_pressure() instead: > > > > cpu = cpumask_first(cpus); > > if (unlikely(cpu >= NR_CPUS)) > > return; > > > > - And while writing this email, I dropped all other ideas in favor of > > change to topology_update_thermal_pressure() :) > > And then I saw your second patch, which looks good as otherwise we > will not be able to catch the bug in our system where we are sending > the empty cpumask :) > > So the other idea is: > > - Revert, or bring back a new version of this and register the > interrupt from there. But that is also not a very clean solution. > > commit 4bf8e582119e ("cpufreq: Remove ready() callback") > We could do this and keep the interrupt disabled until we hit ready(). But I found the resulting issue non-trivial to debug, so I would prefer if arch_update_thermal_pressure() dealt with the empty cpumask. So as you suggest in your first reply, I'll respin the second patch alone, without the WARN_ON(). Thanks, Bjorn