On Wed, Jan 08, 2025 at 10:15:34AM +0100, Neil Armstrong wrote: > On 08/01/2025 04:11, Bjorn Andersson wrote: > > On Tue, Jan 07, 2025 at 09:13:18AM +0100, Neil Armstrong wrote: > > > Hi, > > > > > > On 07/01/2025 00:39, Bjorn Andersson wrote: > > > > On Fri, Jan 03, 2025 at 03:38:26PM +0100, Neil Armstrong wrote: > > > > > On the SM8650, the dynamic clock and voltage scaling (DCVS) is done in an > > > > > hardware controlled loop using the LMH and EPSS blocks with constraints and > > > > > OPPs programmed in the board firmware. > > > > > > > > > > Since the Hardware does a better job at maintaining the CPUs temperature > > > > > in an acceptable range by taking in account more parameters like the die > > > > > characteristics or other factory fused values, it makes no sense to try > > > > > and reproduce a similar set of constraints with the Linux cpufreq thermal > > > > > core. > > > > > > > > > > In addition, the tsens IP is responsible for monitoring the temperature > > > > > across the SoC and the current settings will heavily trigger the tsens > > > > > UP/LOW interrupts if the CPU temperatures reaches the hardware thermal > > > > > constraints which are currently defined in the DT. And since the CPUs > > > > > are not hooked in the thermal trip points, the potential interrupts and > > > > > calculations are a waste of system resources. > > > > > > > > > > Instead, set higher temperatures in the CPU trip points, and hook some CPU > > > > > idle injector with a 100% duty cycle at the highest trip point in the case > > > > > the hardware DCVS cannot handle the temperature surge, and try our best to > > > > > avoid reaching the critical temperature trip point which should trigger an > > > > > inevitable thermal shutdown. > > > > > > > > > > > > > Are you able to hit these higher temperatures? Do you have some test > > > > case where the idle-injection shows to be successful in blocking us from > > > > reaching the critical temp? > > > > > > No, I've been able to test idle-injection and observed a noticeable effect > > > but I had to set lower trip, do you know how I can easily "block" LMH/EPSS from > > > scaling down and let the temp go higher ? > > > > > > > I don't know how to override that configuration. > > > > > > > > > > E.g. in X13s (SC8280XP) we opted for relying on LMH/EPSS and define only > > > > the critical trip for when the hardware fails us. > > > > > > It's the goal here aswell > > > > > > > How about simplifying the patch by removing the idle-injection step and > > just rely on LMH/EPSS and the "critical" trip (at least until someone > > can prove that there's value in the extra mitigation)? > > OK, but I see value in this idle injection mitigation in that case LMH/EPSS > fails, the only factor in control of HLOS is by stopping scheduling tasks > since frequency won't be able to scale anymore. > I think that sounds good, but afaict we don't have any indication of this being a problem and we don't have any way to test that it actually solves that problem. > Anyway, I agree it can be added later on, so should I drop the 2 trip points > and only leave the critical one ? > I think that's a simple and functional starting point - and it solves your IRQ issue. Regards, Bjorn