Re: [RFC v1 3/3] thermal/drivers/mediatek/lvts_thermal: add mt7988 support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 13/09/23 12:52, Frank Wunderlich ha scritto:
Am 13. September 2023 10:16:51 MESZ schrieb AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>:
Hi angelo,

thanks for first look

Il 11/09/23 20:33, Frank Wunderlich ha scritto:
From: Frank Wunderlich <frank-w@xxxxxxxxxxxxxxx>

Add Support for mediatek fologic 880/MT7988.

Signed-off-by: Frank Wunderlich <frank-w@xxxxxxxxxxxxxxx>
---
   drivers/thermal/mediatek/lvts_thermal.c | 73 +++++++++++++++++++++++++
   1 file changed, 73 insertions(+)

diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c
index c1004b4da3b6..48b257a3c80e 100644
--- a/drivers/thermal/mediatek/lvts_thermal.c
+++ b/drivers/thermal/mediatek/lvts_thermal.c
@@ -82,6 +82,8 @@
   #define LVTS_GOLDEN_TEMP_DEFAULT	50
   #define LVTS_COEFF_A_MT8195			-250460
   #define LVTS_COEFF_B_MT8195			250460
+#define LVTS_COEFF_A_MT7988			-204650
+#define LVTS_COEFF_B_MT7988			204650
     #define LVTS_MSR_IMMEDIATE_MODE		0
   #define LVTS_MSR_FILTERED_MODE		1
@@ -1272,6 +1274,67 @@ static int lvts_remove(struct platform_device *pdev)
   	return 0;
   }
   +/*
+ * LVTS MT7988
+ */
+#define LVTS_HW_SHUTDOWN_MT7988	117000

Are you sure that this chip's Tj is >117°C ?!

Looks a bit high... if it is exactly 117°C, I would suggest cutting earlier,
either at 110 (safe side) or 115: after all, this is a life-saver feature and
the chip is actually never meant to *constantly* work at 110°C (as it would
degrade fast and say goodbye earlier than "planned").

I took values from SDK

https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/refs/heads/master/target/linux/mediatek/files-5.4/drivers/thermal/mediatek/soc_temp_lvts.c#1483


That kernel also defines 117°C for MT8195, which leaves me a bit dubious.

For safety, I would recommend using the same 105°C hw shutdown temperature
for 7988 as well: after all if you think about it, reaching that kind of
temperature means that there's a real problem (fan stopped working and/or
heatsink detached) that needs attention.

This is because you'll have trip points in devicetree that should throttle
the CPU in case it reaches at least a dangerously high temperature (read:
a temperature outside the recommended continuous operation range), bringing
the temperatures down because of the throttling action; I would imagine
throttling the CPU a bit down at 80°C, then a bit more at 90°C - but then,
if the temps won't drop like that, there's clearly a HW-related issue that
must be addressed (like the fan/heatsink scenario that I just described).

Though, take this as an advice; I'll respect whichever decision you'll take.

+//enum mt7988_lvts_domain { MT7988_AP_DOMAIN, MT7988_NUM_DOMAIN };
+
+enum mt7988_lvts_sensor_enum {
+	MT7988_TS3_0,
+	MT7988_TS3_1,
+	MT7988_TS3_2,
+	MT7988_TS3_3,
+	MT7988_TS4_0,
+	MT7988_TS4_1,
+	MT7988_TS4_2,
+	MT7988_TS4_3,
+	MT7988_NUM_TS
+};

This enumeration should be definitions in bindings (mediatek,lvts-thermal.h).

Besides, the LVTS is about internal temperatures, so those TS3_x and 4_x can
be renamed like what was done for MT8192 and MT8195: this is because you will
never see TS3_2 being CPU2 on a board and CPU4 on another, being those - again -
internal to the SoC, hence unchangeable.

Right these sensors are internally only and i took naming from sdk to avoid confusion. And i have not more information about these internal sensors (special meaning),but their values are packed together to get the resulting (average) temperature.

Another reason is that you'll anyway have to refer to those sensors in the
devicetree to configure thermal trips and such, so... :-)

In device tree it will look like this:

https://github.com/frank-w/BPI-Router-Linux/blob/6.5-lvts/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L771

Daniel has also defined thermal trips there,but these are untested atm. I only verified temperature itself i get from sysfs as far as i can (start at ~40°C and reaching ~70 while running).


Check how it's done in mt8192.dtsi and mt8195.dtsi: we're using the definitions
from the bindings for thermal zones.
At least for consistency (apart from other even more valid considerations), that's
how it should be done: please do it like that.

Besides, I think that you can definitely guess what `TS` is CPU related: checking
the driver and devicetree from the downstream kernel that you provided, what I
understand is:

1. Your naming TS3,4 corresponds to TS2,3 downstream (so it's wrong?)
2. Downstream TS2 is related to the CPU cores, so it should be
   - TS2_0 - CPU0
   - TS2_1 - CPU1
   - TS2_2 - CPU2
   - TS2_3 - CPU3

The other set of thermal sensors seem to be completely unused, so we cannot guess
just by looking at the code. Since you have the hardware, you may be able to take
a position about what they are by checking what sensor heats up for each "action"
(could be ethernet controller or infra or whatever else); if in doubt, just leave
them out, but add a comment saying that there are more sensors and say where, so
that if anyone finds what they're for, it'll be easy to add them.

In any case, adding thermal sensors that you don't know about is meaningless, as
the sense of all this is:
 - Monitoring temperatures of the hardware
 - Taking action to prevent temperature overrun
but if you don't know what you're reading, you can't interpret temperatures for
something unknown, hence you can't as well take action to prevent overruns, as
you won't know what's the maximum temperature for "unknown thing" :-)

Regards,
Angelo




[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux