On 09/01/2025 16:18, Konrad Dybcio wrote:
On 8.01.2025 10:15 AM, Neil Armstrong wrote:
On 08/01/2025 04:11, Bjorn Andersson wrote:
On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote:
Hi,
On 07/01/2025 00:39, Bjorn Andersson wrote:
On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote:
On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an
hardware controlled loop using the LMH and EPSS blocks with constraints and
OPPs programmed in the board firmware.
Since the Hardware does a better job at maintaining the CPUs temperature
in an acceptable range by taking in account more parameters like the die
characteristics or other factory fused values, it makes no sense to try
and reproduce a similar set of constraints with the Linux cpufreq thermal
core.
In addition, the tsens IP is responsible for monitoring the temperature
across the SoC and the current settings will heavily trigger the tsens
UP/LOW interrupts if the CPU temperatures reaches the hardware thermal
constraints which are currently defined in the DT. And since the CPUs
are not hooked in the thermal trip points, the potential interrupts and
calculations are a waste of system resources.
Instead, set higher temperatures in the CPU trip points, and hook some CPU
idle injector with a 100% duty cycle at the highest trip point in the case
the hardware DCVS cannot handle the temperature surge, and try our best to
avoid reaching the critical temperature trip point which should trigger an
inevitable thermal shutdown.
Are you able to hit these higher temperatures? Do you have some test
case where the idle-injection shows to be successful in blocking us from
reaching the critical temp?
No, I've been able to test idle-injection and observed a noticeable effect
but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from
scaling down and let the temp go higher ?
I don't know how to override that configuration.
I'll try to get some answers. SDM845 seems to expose a couple SCM calls for
this purpose and it's already wired up in drivers/thermal/qcom/lmh.c
Would be great, thx
E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only
the critical trip for when the hardware fails us.
It's the goal here aswell
How about simplifying the patch by removing the idle-injection step and
just rely on LMH/EPSS and the "critical" trip (at least until someone
can prove that there's value in the extra mitigation)?
OK, but I see value in this idle injection mitigation in that case LMH/EPSS
fails, the only factor in control of HLOS is by stopping scheduling tasks
since frequency won't be able to scale anymore.
If LMH fails, your SoC is probably cooked already, anyway :(
I'm not sure why idle injection isn't enabled by default if no other cooling
methods are found. Perhaps that could be discussed with some thermal folks..
Yeah this is good question, this should probably be the default "hot" behaviour
Anyway, I agree it can be added later on, so should I drop the 2 trip points
and only leave the critical one ?
I think sticking with critical=Tjmax + critical-action = "reboot" may be the
way to go here.
We may want to give some folks a heads up, so they can wire up skin sensors
on their devices ahead of these changes landing tree-wide.
Yeah it's also my goal, will respin with only critical.
Thanks,
Neil
Konrad