On Fri, Sep 13, 2019 at 3:35 PM H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote: > > Hi Daniel, > > > Am 13.09.2019 um 22:11 schrieb Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>: > > > > On 13/09/2019 20:46, Adam Ford wrote: > >> On Fri, Sep 13, 2019 at 12:18 PM Daniel Lezcano > >> <daniel.lezcano@xxxxxxxxxx> wrote: > >>> > >>> On 13/09/2019 18:51, H. Nikolaus Schaller wrote: > >>> > >>> [ ... ] > >>> > >>>>> Good news (I think) > >>>>> > >>>>> With cooling-device = <&cpu 1 2> setup, I was able to ask the max > >>>>> frequency and it returned 600MHz. > >>>>> > >>>>> # cat /sys/devices/virtual/thermal/thermal_zone0/temp > >>>>> 58500 > >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies > >>>>> 300000 600000 800000 > >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_m > >>>>> scaling_max_freq scaling_min_freq > >>>>> # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq > >>>>> 600000 > >>>> > >>>> looks good! > >>>> But we have to understand what the <&cpu 1 2> exactly means... > >>>> > >>>> Hopefully someone reading your RFCv2 can answer... > >>> > >> Daniel, > >> > >> Thank you for replying. > >> > >>> I may have missed the question :) > >>> > >>> These are the states allowed for the cooling device (the one you can see > >>> in the /sys/class/thermal/cooling_device0/max_state. As the logic is > >>> inverted for cpufreq, that can be confusing. > >> > >> I think that's what has be confused. > >> > >>> > >>> If it was a fan with, let's say 5 speeds, you would use <&fan 0 5>, so > >>> when the mitigation begins the cooling device state is 0 and then the > >>> thermal governor increase the state until it sees a cooling effect. > >>> > >>> If <&fan 0 2> is set, the governor won't set a state above 2 even if the > >>> temperature increases. > >> > >> I am not sure I know what you mean by 'state' in this context. > > > > A thermal zone is managed by the thermal framework as the following: > > - a sensor > > - a governor > > - a cooling device > > > > The governor gets the temperature via the sensor and depending on the > > temperature it will increase or decrease the cooling effect of the > > cooling device. With a fan, that means it will increase or decrease its > > speed. With cpufreq, it will decrease or increase the OPP. > > > > These are discrete values the governor will use to set the cooling > > effect. The state is one of these value (the current speed or the > > current OPP index). > > > > Depending on the cooling device, the number of states are different. > > > > In the context above, the fan cooling device can be stopped (state=0), > > running (state=1), running faster (state=2). > > > > As the node tells to use no more than 2, then the governor will never go > > to running much faster (state=3). (That's an example). > > > >>> When the cooling driver is able to return the number of states it > >>> supports, it is safe to set the states to THERMAL_NO_LIMIT and let the > >>> governor to find the balance point. > >> > >> If the cooling driver is using cpufreq, is the number of supported > >> states equal to the number of operating points given to cpufreq? > > > > Yes, absolutely if THERMAL_NO_LIMIT is set [1] (which is what is done > > most of the cases). Otherwise it will use the boundaries set in <&cpu > > limit_low limit_high> > > > > When changing the limits, a state=1 has a different meaning. > > > > For example: 7 OPPs available > > > > <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> : state=[0..7] > > > > <&cpu 0 2> : state=[0..2] (1, 2) > > > > <&cpu 5 7> : state=[0..3] (5, 6, 7) > > > > [1] > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/thermal/cpu_cooling.c#n334 > > > >>> Now if the cooling device is cpufreq, the state order is inverted, > >>> because the cooling effects happens when decreasing the OPP. > >>> > >>> If the boards support 7 OPPs, the state 0 is 7 - 0, so no mitigation, if > >>> the state is 1, the cpufreq is throttle to the 6th OPP, 2 to the 5th OPP > >>> etc. > >> > >> I am not sure how the state would be set to 2. > > > > That is a governor decision. Let me give an example with a hikey960 > > board which has very fast temperature transitions, so it is simpler to > > illustrate the behavior. The trip point is 75°C. > > > > Imagine the CPU gets loaded 100%, the cpufreq sets the OPP to the max > > (2.36GHz), as the temperature is still under 75°C, there is no > > mitigation yet, so the cooling device state is 0. > > > > In a very few seconds the temperature reaches 75°C, that trigger the > > monitoring of the thermal zone and the mitigation begins, then the > > temperature continues to increase very quickly to 78°C, the governor see > > we are above the trip point and increment the cooling device state > > (state=>1). That leads to an OPP change from 2.36GHz to 2.11GHz. > > > > The governor continues to read the temperature and see the temperature > > is still increasing (even if it is that happens more slowly), so it > > increases the state again (state=>2). That leads to an OPP change from > > 2.11GHz to 1.8GHz. > > > > The governor continues to read the temperature and see the temperature > > decrease, it does nothing. > > Ah, I think our misunderstanding is that the govenor "enables" and > "disables" a set of OPPs. Rather it goes down or up in the list if > above or below a trip point. > > > > > The governor continues to read the temperature, see the temperature > > decreases and is below 75°C, it decrease the state (state=>1), the OPP > > change to 2.36GHz. > > > > The temperature then increases, etc ... > > > > Actually the governors do more than that but it is for the example. > > > > So it is a bad idea to set boundaries for the cooling device state as > > that may prevent the governor to take the right decision for the cooling > > effect. Imagine in the example above, we set the max state to 1 for the > > cooling device, that would mean the governor won't be able to stop the > > temperature increasing, thus ending up to a hard reboot. > > Well, the data sheet only requires that the high speed OPPs are only > used below 90°C. If I understand correctly if we set the trip point to > 90°C it will simply go down through the full list of OPPs. This will > clearly avoid the high speed OPPs (and potentially some low-speed > ones, but that does not harm). > > So our approach "how to make it disable these two OPPs" seems to be > wrong. Rather, we have to think "make sure the temperature > stays below 90°C". > > And is it true that we do not have to define mapping for the "critical" > trip points? > > > > >>> Now the different combinations: > >>> > >>> <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> the governor will use the state > >>> 0 to 7. > >>> > >>> <&cpu THERMAL_NO_LIMIT 2> the governor will use the state 0 to 2 > >> > >> What would be the difference between <&cpu THERMAL_NO_LIMIT 2> and > >> <&cpu 0 2> ? > >> (if there is any) > > > > There is no difference. > > > > > >>> <&cpu 1 2> the governor will use the state 1 and 2. That means there is > >>> always the cooling effect as the governor won't set it to zero thus > >>> stopping the mitigation. > >> > >> For the purposes of the board in question, we have 4 operating points, > >> 300MHz, 600MHz, 800MHz and 1GHz. Once the board reaches 90C, we need > >> them to cease operation at 800MHz and 1GHz and only permit operation > >> at 300MHz and 600MHz. I am going under the assumption that the cpu > >> index[0] would be for 300MHz, index[1] = 600MHz, etc. > >> > >> If I am interpreting your comment correctly, I should set <&cpu > >> THERMAL_NO_LIMIT 2> which would allow it to either not cool and run up > >> to 600MHz and not exceed, is that correct? > > > > Nope, it will mean the cooling device can only reduce to 800MHz and to > > 600MHz to mitigate. > > > > Actually the thermal framework neither the kernel are designed to handle > > this case. They assume the OPPs are stable whatever the thermal situation. > > > > That is the reason why I think it is a very interesting use case because > > it introduces a temperature constraint in addition to a duration for a > > certain OPP. IMO, that could be an extension of the turbo-mode. > > > > With what we have now, I doubt it is feasible. > > > > The best we can do is preventing to reach the 90°C, so we remove the OPP > > temperature constraint. I suppose 85°C is a safe temperature to stick on. > > > > And in order to let the governor have free hand. > > > > <&cpu THERMAL_NO_LIMIT THERMAL_NO_LIMIT> > > > > I don't think that will have a significant impact on performances > > compared to be able to run at a higher temperature with less OPPs. Thank you for the explanation. I think I'll ask Tony to drop this RFC since we have what you're proposing already in a separate series. I appreciate your explanations. adam > > > > > >>> Does it clarify the DT spec? > >>> > >> > >> I think your reply to my inquiry might. If possible, it would be nice > >> to get this documented into the bindings doc for others in the future. > >> I can do it, but someone with a better understanding of the concept > >> maybe more qualified. I can totally understand why some may want to > >> integrate this into their SoC device trees to slow the processor when > >> hot. > >> > >> Thank you for taking the time to review this. I appreciate it. > >> > >> adam > > BR, > Nikolaus >