On 1/28/22 3:25 AM, Bjorn Andersson wrote:
In the event that the SoC is under thermal pressure while booting it's possible for the dcvs notification to happen inbetween the cpufreq framework calling init and it actually updating the policy's related_cpus cpumask. Prior to the introduction of the thermal pressure update helper an empty cpumask would simply result in the thermal pressure of no cpus being updated, but the new code will attempt to dereference an invalid per_cpu variable.
Just to confirm, is that per-cpu var the 'policy->related_cpus' in this driver?
Avoid this problem by using the newly reintroduced "ready" callback, to postpone enabling the IRQ until the related_cpus cpumask is filled in. Fixes: 0258cb19c77d ("cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function")
You have 'Fixes' tagging here, which might be picked by the stable tree. The code uses the reverted callback .ready(), which might be missing there (since patch 1/2 doesn't have tagging). This patch looks like a proper fix for the root cause. Anyway, I'm going to send a patch, which adds a check for null cpumask in the topology_update_thermal_pressure() It was removed after the review comments: https://lore.kernel.org/linux-pm/20211028054459.dve6s2my2tq7odem@vireshk-i7/ I'll also push that change for the stable tree. Regards, Lukasz