On Fri 28 Jan 02:39 PST 2022, Lukasz Luba wrote: > > > On 1/28/22 3:25 AM, Bjorn Andersson wrote: > > In the event that the SoC is under thermal pressure while booting it's > > possible for the dcvs notification to happen inbetween the cpufreq > > framework calling init and it actually updating the policy's > > related_cpus cpumask. > > > > Prior to the introduction of the thermal pressure update helper an empty > > cpumask would simply result in the thermal pressure of no cpus being > > updated, but the new code will attempt to dereference an invalid per_cpu > > variable. > > Just to confirm, is that per-cpu var the 'policy->related_cpus' in this > driver? > Correct, we boot under thermal pressure, so the interrupt fires before we return from "init", which means that related_cpus is still 0. > > > > Avoid this problem by using the newly reintroduced "ready" callback, to > > postpone enabling the IRQ until the related_cpus cpumask is filled in. > > > > Fixes: 0258cb19c77d ("cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function") > > You have 'Fixes' tagging here, which might be picked by the stable tree. > The code uses the reverted callback .ready(), which might be missing > there (since patch 1/2 doesn't have tagging). This patch looks like a > proper fix for the root cause. > Yes, the pair would need to be picked up. > Anyway, I'm going to send a patch, which adds a check for null cpumask > in the topology_update_thermal_pressure() > It was removed after the review comments: > https://lore.kernel.org/linux-pm/20211028054459.dve6s2my2tq7odem@vireshk-i7/ > I attempted that in v1: https://lore.kernel.org/all/20220118185612.2067031-2-bjorn.andersson@xxxxxxxxxx/ And while patch 1 is broken, I think Greg and Sudeep made it clear that they didn't want a condition to guard against the caller passing cpus of 0. That's why I in v2 reverted to postpone the thermal pressure IRQ until cpufreq is "ready". Regards, Bjorn > I'll also push that change for the stable tree. > > Regards, > Lukasz