On Fri, Oct 13, 2023 at 02:45:57PM +0200, Nicolas Chauvet wrote: > Le ven. 13 oct. 2023 à 13:43, Thierry Reding > <thierry.reding@xxxxxxxxx> a écrit : > > > > On Fri, Oct 13, 2023 at 11:14:25AM +0200, Nicolas Chauvet wrote: > > > Le jeu. 12 oct. 2023 à 19:58, Thierry Reding > > > <thierry.reding@xxxxxxxxx> a écrit : > > > > > > > > From: Thierry Reding <treding@xxxxxxxxxx> > > > > > > > > Hi, > > > > > > > > this set of patches removes the registration of the SOCTHERM internal > > > > throttling mechanism as cooling device. Since this throttling starts > > > > automatically once a certain temperature threshold is crossed, it > > > > doesn't make sense to represent it as a cooling device, which are > > > > typically "manually" activated by the thermal framework when thermal > > > > sensors report temperature thresholds being crossed. > > > > > > > > Instead of using the cooling device mechanism, this statically programs > > > > the throttling mechanism when it is configured in device tree. In order > > > > to do this, an additional device tree property is needed to replace the > > > > information that was previously contained in trip points. > > > > > > > > There's a few preparatory patches to make the removal a bit simpler and > > > > also some follow up cleanups included as well. > > > > > > > > Changes in v2: > > > > - rework the device tree bindings: > > > > - add nvidia,thermal-zones property to attach throttling to zones > > > > - use -millicelsius suffix and add hysteresis > > > > - add patch to store thermal zone device tree node for later use > > > > - add patch to enforce self-encapsulation of the thermal core now that > > > > no drivers need to reach into it anymore > > > > > > > > This applies on top of Daniel's self-encapsulation hardening series: > > > > > > > > https://lore.kernel.org/all/20231012102700.2858952-1-daniel.lezcano@xxxxxxxxxx/ > > > > > > > > Thierry > > > > > > > > Thierry Reding (13): > > > > thermal: Store device tree node for thermal zone devices > > > > dt-bindings: thermal: tegra: Document throttle temperature > > > > dt-bindings: thermal: tegra: Add nvidia,thermal-zones property > > > > thermal: tegra: Use driver-private data consistently > > > > thermal: tegra: Constify SoC-specific data > > > > thermal: tegra: Do not register cooling device > > > > thermal: tegra: Use unsigned int where appropriate > > > > thermal: tegra: Avoid over-allocation of temporary array > > > > thermal: tegra: Remove gratuitous error assignment > > > > thermal: tegra: Minor stylistic cleanups > > > > ARM: tegra: Rework SOCTHERM on Tegra124 > > > > arm64: tegra: Rework SOCTHERM on Tegra132 and Tegra210 > > > > thermal: Enforce self-encapsulation > > > > > > > > .../thermal/nvidia,tegra124-soctherm.yaml | 19 + > > > > arch/arm/boot/dts/nvidia/tegra124.dtsi | 68 +-- > > > > arch/arm64/boot/dts/nvidia/tegra132.dtsi | 66 +-- > > > > arch/arm64/boot/dts/nvidia/tegra210.dtsi | 86 +-- > > > > drivers/thermal/tegra/soctherm.c | 525 ++++++++---------- > > > > drivers/thermal/tegra/soctherm.h | 1 + > > > > drivers/thermal/tegra/tegra124-soctherm.c | 4 + > > > > drivers/thermal/tegra/tegra132-soctherm.c | 4 + > > > > drivers/thermal/tegra/tegra210-soctherm.c | 4 + > > > > drivers/thermal/thermal_core.h | 2 +- > > > > drivers/thermal/thermal_of.c | 3 + > > > > 11 files changed, 329 insertions(+), 453 deletions(-) > > > > > > > > -- > > > > 2.42.0 > > > > > > > > > > I'm still experiencing the following message on jetson-tx1 with this > > > serie applied on top of 6.6-rc5 (with iommu-next and tegra-next > > > applied). > > > oct. 13 10:53:16 jetson-tx1 kernel: max77620-thermal max77620-thermal: > > > Failed to register thermal zone: -19 > > > > Not sure about this one. I don't think I've seen it. Do you know if > > that's somehow caused by this series, or is it preexisting? > > It's pre-existing from this serie. > > > > oct. 13 10:53:16 jetson-tx1 kernel: tegra_soctherm > > > 700e2000.thermal-sensor: throttle-cfg: heavy: no throt prop or invalid > > > prop > > > > > > Is this expected ? > > > > This one is definitely not expected. I have seen this before, and it > > happens when the device tree doesn't contain all the properties that are > > expected. Patch 12 in this series should take care of that. Have you > > made sure that that's been applied and is what the kernel uses to boot > > with? > > Yes, this dtb change in patch12 is propagated to the device (as seen > in /boot/dtbs) > But comparing with what's available at runtime in /proc/device-tree, I > see some changes > > heavy { > - hysteresis-millicelsius = <0xfa0>; > + #cooling-cells = <0x02>; > nvidia,cpu-throt-percent = <0x55>; > nvidia,gpu-throt-level = <0x03>; > nvidia,priority = <0x64>; > - nvidia,thermal-zones = <0x49 0x4a>; > - temperature-millicelsius = <0x180c4>; > + phandle = <0x130>; > }; Okay, that explains the error message. > > I'm using u-boot 2023.07 with EFI boot (L4T 32.7.4). > Could it be that the bootloader has changed these entries ? Can this > be prevented ? I'm not aware of anything in the bootloader that would do this. Some versions of U-Boot that ships with L4T can copy certain nodes in DTB but I have never seen anything that would've touched thermal. Is it possible that you're not loading the DTB and end up receiving one from UEFI or cboot? > (MAC ethernet address is set as appropropriate). That's a completely separate mechanism and shouldn't touch thermal at all. Thierry
Attachment:
signature.asc
Description: PGP signature