On Fri, Nov 10, 2023 at 02:55:15PM +0100, Thierry Reding wrote: > On Fri, Oct 13, 2023 at 05:57:13PM +0200, Daniel Lezcano wrote: > > On 12/10/2023 19:58, Thierry Reding wrote: > > > From: Thierry Reding <treding@xxxxxxxxxx> > > > > > > The SOCTHERM's built-in throttling mechanism doesn't map well to the > > > concept of a cooling device because it will automatically start to > > > throttle when the programmed temperature threshold is crossed. > > > > > > Remove the cooling device implementation and instead unconditionally > > > program the throttling for the CPU and GPU thermal zones. > > > > > > Signed-off-by: Thierry Reding <treding@xxxxxxxxxx> > > > --- > > > > [ ... ] > > > > > + ret = of_property_read_u32(np, "temperature-millicelsius", > > > + &stc->temperature); > > > + if (ret < 0) > > > + goto err; > > > + > > > + ret = of_property_read_u32(np, "hysteresis-millicelsius", > > > + &stc->hysteresis); > > > + if (ret < 0) > > > + goto err; > > > + > > > + stc->num_zones = of_count_phandle_with_args(np, "nvidia,thermal-zones", > > > + NULL); > > > + if (stc->num_zones > 0) { > > > + struct device_node *zone; > > > + unsigned int i; > > > + > > > + stc->zones = devm_kcalloc(ts->dev, stc->num_zones, sizeof(zone), > > > + GFP_KERNEL); > > > + if (!stc->zones) > > > + return -ENOMEM; > > > + > > > + for (i = 0; i < stc->num_zones; i++) { > > > + zone = of_parse_phandle(np, "nvidia,thermal-zones", i); > > > + stc->zones[i] = zone; > > > + } > > > + } > > > > What is the connection between the temperature sensor and the hardware > > limiter? > > > > I mean, one hand there is the hardware limiter which is not connected to the > > sensor neither a thermal zone and it could be self contained in a separate > > driver. And then there is the temperature sensor. > > > > The thermal zone phandle things connected with the throttling bindings > > sounds like strange to me. > > > > What prevents to split the throttling and the sensor into separate code? > > Both the temperature sensor and the hardware throttle mechanism are part > of the same IP block, so it would be quite difficult (and unnecessary) > to split them into separate drivers. > > The hardware throttler uses the temperature sensor's data to initiate > throttling automatically when certain (programmable) temperature > thresholds are reached. > > The reason why we need to reference the thermal zone is because the > registers needed to program the throttler are contained within the > sensor group (which are effectively mapped to thermal zones). > > I suppose there are a number of other ways how this could be described. > The thermal zones could be extended with extra information about the > throttling, or we could use just the sensor group ID instead of a full > phandle to reference this. > > I was sort of trying to keep things somewhat aligned with the concept of > thermal zones and not rewrite the entire thing, but perhaps I should go > back to the drawing board and think about whether there's an even better > way to describe this in DT. I've looked at the documentation in a bit more details and here's an high-level overview of what SOCTHERM is. We have four groups (CPU, GPU, MEM and PLLX), each of which can be programmed at four different levels (each level is an identical set of registers to program temperature thresholds, throttling and enable or disable). For temperature thresholds an interrupt can be configured. There's an additional "thermtrip" level, which only has a threshold that, when reached, will cause an emergency, hardware-induced shutdown of the system. Any of the generic levels can be used in whatever way we want. The convention currently is to program the thermal zone trip points using level 0. So for each group we create a thermal zone and level 0 for each of the zones is programmed with the low and high thresholds for a given trip point. Currently we also use levels 1 and 2 to program the "light" and "heavy" throttling "indicators". These will in turn be used to generate outputs to the actual throttling mechanisms (CPU-light, CPU-heavy, GPU-light and GPU-heavy). There are a few other things that can be done, but I don't fully understand how they would be useful and I don't think they've ever been used, so I'll skip those for now. Given the above, the thermal zone trip points are fairly clear. They are fine as they are implemented. For the throttling mechanism we could do something that maps more explicitly to the above groups and levels concepts, but I think that could easily conflict with the trip points programming, so keeping with the current conventions seems good and designing the device tree bindings accordingly would help avoid any conflicts. So I think keeping the throttle-cfgs node is a good fit. We don't really need to establish a connection between the thermal zone and the throttle mechanism, though. We can derive the level from the indicator (light or heavy) and for the group we only need an ID. The reason why I proposed a link to the thermal-zone is because that thermal zone contains that ID already, but we could equally well just add an nvidia,group property or something along those lines so we know which group to use rather than try and get it from a thermal zone. I'll revise the bindings to see if I can come up with something. Thierry
Attachment:
signature.asc
Description: PGP signature