Re: [PATCH] arm64: dts: rockchip: Prevent thermal runaways in RK3308 SoC dtsi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Freitag, 11. Oktober 2024, 11:04:38 CEST schrieb Dragan Simic:
> Hello Jonas,
> 
> On 2024-10-11 10:52, Jonas Karlman wrote:
> > On 2024-10-10 12:19, Dragan Simic wrote:
> >> Until the TSADC, thermal zones, thermal trips and cooling maps are 
> >> defined
> >> in the RK3308 SoC dtsi, none of the CPU OPPs except the slowest one 
> >> may be
> >> enabled under any circumstances.  Allowing the DVFS to scale the CPU 
> >> cores
> >> up without even just the critical CPU thermal trip in place can rather 
> >> easily
> >> result in thermal runaways and damaged SoCs, which is bad.
> >> 
> >> Thus, leave only the lowest available CPU OPP enabled for now.
> > 
> > This feel like a very aggressive limitation, to only allow the
> > opp-suspend rate, that is not even used under normal load.
> > 
> > I let my Rock Pi S board with a RK3308B variant run "stress -c 8" for
> > around 10 hours and the reported temp only reach around 50-55 deg c,
> > ambient temp around 20 deg c and board laying flat on a table without
> > any enclosure or heat sink.
> > 
> > This was running with performance as scaling_governor and cpu running
> > the 1008000 opp.
> 
> Thanks for testing all that!  That's very low CPU temperature under
> stress testing indeed.  Maybe the cooling gets worse and the CPU
> temperature goes higher if the board is installed into some small
> enclosure with no natural or forced airflow?
> 
> > Most RK3308 variants datasheets list 1.3 GHz as max rate for CPU,
> > the K-variant lists 1.2 GHz, and the -S-variants seem to have both
> > reduced voltage and max rate.
> > 
> > The OPPs for this SoC already limits max rate to 1 GHz and is more than
> > likely good enough to not reach the max temperature of 115-125 deg c as
> > rated in datasheets and vendor DTs.
> > 
> > Adding the tsadc and trips (same/similar as px30) will probably allow 
> > us
> > to add/use the "missing" 1.2 and 1.3 GHz OPPs.
> 
> With these insights, I agree that the patch might have been a bit
> too extreme, but it also promotes good practices when it comes to
> upstreaming.  The general rule is not to add CPU or GPU OPPs with
> no proper thermal configuration already in place.
> 
> The patch has already been merged, and as I already noted, [1] I'll
> try to implement, test and submit the proper thermal configuration
> ASAP.  It's up Heiko to decide whether to drop this patch or not.

Hmm, interesting question ;-) .

Dropping the patch is of course still possible and so far we haven't
actually seen anyone with real-world problems.

And with Jonas' stress test, it does look like nobody will in the
(hopefully short) time till we have thermal management.

@Dragan, if you're in favor of that I'll drop the patch.


Heiko


> 
> [1] 
> https://lore.kernel.org/linux-rockchip/df92710498f66bcb4580cb2cd1573fb2@xxxxxxxxxxx/
> 
> >> Fixes: 6913c45239fd ("arm64: dts: rockchip: Add core dts for RK3308 
> >> SOC")
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Signed-off-by: Dragan Simic <dsimic@xxxxxxxxxxx>
> >> ---
> >>  arch/arm64/boot/dts/rockchip/rk3308.dtsi | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/arch/arm64/boot/dts/rockchip/rk3308.dtsi 
> >> b/arch/arm64/boot/dts/rockchip/rk3308.dtsi
> >> index 31c25de2d689..a7698e1f6b9e 100644
> >> --- a/arch/arm64/boot/dts/rockchip/rk3308.dtsi
> >> +++ b/arch/arm64/boot/dts/rockchip/rk3308.dtsi
> >> @@ -120,16 +120,19 @@ opp-600000000 {
> >>  			opp-hz = /bits/ 64 <600000000>;
> >>  			opp-microvolt = <950000 950000 1340000>;
> >>  			clock-latency-ns = <40000>;
> >> +			status = "disabled";
> >>  		};
> >>  		opp-816000000 {
> >>  			opp-hz = /bits/ 64 <816000000>;
> >>  			opp-microvolt = <1025000 1025000 1340000>;
> >>  			clock-latency-ns = <40000>;
> >> +			status = "disabled";
> >>  		};
> >>  		opp-1008000000 {
> >>  			opp-hz = /bits/ 64 <1008000000>;
> >>  			opp-microvolt = <1125000 1125000 1340000>;
> >>  			clock-latency-ns = <40000>;
> >> +			status = "disabled";
> >>  		};
> >>  	};
> 








[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux