On Mon, Feb 06, 2023 at 03:50:22PM +0100, Daniel Lezcano wrote: > > Hi Thierry, > > did you have the time to look at the get_thermal_instance() removal ? > > > On 26/01/2023 13:55, Thierry Reding wrote: > > > [ 12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp > > [ 12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC > > [ 12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500 > > [ 12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC > > [ 12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC > > [ 12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000 > > [ 12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC > > [ 12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC > > [ 12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC > > > > and after these changes, it turns into: > > > > [ 12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp > > [ 12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC > > [ 12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500 > > [ 12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC > > [ 12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC > > [ 12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000 > > [ 12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC > > [ 12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC > > [ 12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000 > > [ 12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC > > [ 12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC > > [ 12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000 > > [ 12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC > > > > The "programming throttle ..." messages are something I've added locally > > to trace what gets called. So it looks like for "pll" and "mem" thermal > > zones, we now program trip points whereas we previously didn't. My diagnosis above wasn't entirely correct. We're not actually skipping trip point programming for PLL and MEM thermal zones in the current code. Instead, we skip throttle programming. As far as I can tell this is a mechanism built into ACTMON to allow it to automatically throttle when a zone reaches a certain temperature. This is modelled as a cooling device, but internally it's actually done automatically, which is why we have this code that programs the throttle at driver probe time, rather than the on-demand programming that typical cooling device would do (such as a fan). The reason why we have get_thermal_instance() here is to check if this built-in cooling device has been configured for the "hot" trip point. If not, we don't want the throttle programming to happen. This adds the added flexibility of explicitly disabling the automatic throttling by ACTMON and using another cooling device (or none at all) if that's what is needed. Dropping just the call to get_thermal_instance() and relying on the find_throttle_cfg_by_name() function will always return a valid throttle configuration. This is slightly obfuscated because of this: cdev = ts->throt_cfgs[i].cdev; if (get_thermal_instance(tz, cdev, trip_id)) stc = find_throttle_cfg_by_name(ts, cdev->type); As far as I can tell this will always return &ts->throt_cfgs[i], so the find_throttle_cfg_by_name() call is a bit redundant here. I'll look into fixing that. In any case, the important thing is that it would always find a valid throttle configuration and therefore program the throttle, even if we may not want to. Possibly we could work around that by removing this fiddly special case and instead add a new callback for the cooling devices that can be run when they are bound to a thermal zone. This would allow the throttle programming to be initiated from within the thermal core rather than "bolted on" like it is now and should allow us to achieve the same effect but without calling into get_thermal_instance(). I'll try and prototype this, but feel free to suggest anything better if you can think of something. Thierry
Attachment:
signature.asc
Description: PGP signature