On 8.01.2025 10:15 AM, Neil Armstrong wrote: > On 08/01/2025 04:11, Bjorn Andersson wrote: >> On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: >>> Hi, >>> >>> On 07/01/2025 00:39, Bjorn Andersson wrote: >>>> On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: >>>>> On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an >>>>> hardware controlled loop using the LMH and EPSS blocks with constraints and >>>>> OPPs programmed in the board firmware. >>>>> >>>>> Since the Hardware does a better job at maintaining the CPUs temperature >>>>> in an acceptable range by taking in account more parameters like the die >>>>> characteristics or other factory fused values, it makes no sense to try >>>>> and reproduce a similar set of constraints with the Linux cpufreq thermal >>>>> core. >>>>> >>>>> In addition, the tsens IP is responsible for monitoring the temperature >>>>> across the SoC and the current settings will heavily trigger the tsens >>>>> UP/LOW interrupts if the CPU temperatures reaches the hardware thermal >>>>> constraints which are currently defined in the DT. And since the CPUs >>>>> are not hooked in the thermal trip points, the potential interrupts and >>>>> calculations are a waste of system resources. >>>>> >>>>> Instead, set higher temperatures in the CPU trip points, and hook some CPU >>>>> idle injector with a 100% duty cycle at the highest trip point in the case >>>>> the hardware DCVS cannot handle the temperature surge, and try our best to >>>>> avoid reaching the critical temperature trip point which should trigger an >>>>> inevitable thermal shutdown. >>>>> >>>> >>>> Are you able to hit these higher temperatures? Do you have some test >>>> case where the idle-injection shows to be successful in blocking us from >>>> reaching the critical temp? >>> >>> No, I've been able to test idle-injection and observed a noticeable effect >>> but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from >>> scaling down and let the temp go higher ? >>> >> >> I don't know how to override that configuration. I'll try to get some answers. SDM845 seems to expose a couple SCM calls for this purpose and it's already wired up in drivers/thermal/qcom/lmh.c >>>> E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only >>>> the critical trip for when the hardware fails us. >>> >>> It's the goal here aswell >>> >> >> How about simplifying the patch by removing the idle-injection step and >> just rely on LMH/EPSS and the "critical" trip (at least until someone >> can prove that there's value in the extra mitigation)? > > OK, but I see value in this idle injection mitigation in that case LMH/EPSS > fails, the only factor in control of HLOS is by stopping scheduling tasks > since frequency won't be able to scale anymore. If LMH fails, your SoC is probably cooked already, anyway :( I'm not sure why idle injection isn't enabled by default if no other cooling methods are found. Perhaps that could be discussed with some thermal folks.. > Anyway, I agree it can be added later on, so should I drop the 2 trip points > and only leave the critical one ? I think sticking with critical=Tjmax + critical-action = "reboot" may be the way to go here. We may want to give some folks a heads up, so they can wire up skin sensors on their devices ahead of these changes landing tree-wide. Konrad