On 15/06/2021 15:01, Dmitry Osipenko wrote: > 15.06.2021 13:26, Viresh Kumar пишет: >> On 15-06-21, 12:03, Daniel Lezcano wrote: >>> >>> [Cc Viresh] >>> >>> On 29/05/2021 19:09, Dmitry Osipenko wrote: >>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which >>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency >>>> throttling, which is activated by hardware once preprogrammed temperature >>>> level is breached, they also send signal to Power Management controller to >>>> perform emergency shutdown on a critical overheat of the SoC die. Add >>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor >>>> and a cooling device. >>> >>> IMO it does not make sense to expose the hardware throttling mechanism >>> as a cooling device because it is not usable anywhere from the thermal >>> framework. >>> >>> Moreover, that will collide with the thermal / cpufreq framework >>> mitigation (hardware sets the frequency but the software thinks the freq >>> is different), right ? > > H/w mitigation is additional and should be transparent to the software > mitigation. The software mitigation is much more flexible, but it has > latency. Software also could crash and hang. > > Hardware mitigation doesn't have latency and it will continue to work > regardless of the software state. Yes, I agree. Both solutions have their pros and cons. However, I don't think they can co-exist sanely. > The CCF driver is aware about the h/w cooling status [1], hence software > sees the actual frequency. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660 Ah interesting, thanks for the pointer. What I'm worried about is the consistency with cpufreq. Probably cpufreq_update_limits() should be called from the interrupt handler. >> I am not even sure what the cooling device is doing here: >> >> tegra_tsensor_set_cur_state() is not implemented and it says hardware >> changed it by itself. What is the benefit you are getting out of the >> cooling device here ? > > It allows userspace to check whether hardware cooling is active via the > cooling_device sysfs. Otherwise we don't have ability to check whether > h/w cooling is active, I think it's a useful information. It's also > interesting to see the cooling_device stats, showing how many times h/w > mitigation was active. Actually the stats are for software mitigation. For the driver, create a debugfs entry like what do the other drivers or a module parameter with the stats. >>> The hardware limiter should let know the cpufreq framework about the >>> frequency change. >>> >>> https://lkml.org/lkml/2021/6/8/1792 >>> >>> May be post the sensor without the hw limiter for now and address that >>> in a separate series ? >> > > I wasn't aware about existence of the thermal pressure, thank you for > pointing at it. At a quick glance it should be possible to benefit from > the information about the additional pressure. > > Seems the current thermal pressure API assumes that there is only one > user of the API. So it's impossible to aggregate the pressure from > different sources, like software cpufreq pressure + h/w freq pressure. > Correct? If yes, then please let me know yours thoughts about the best > approach of supporting the aggregation. That is a good question. IMO, first step would be to call cpufreq_update_limits(). [ Cc Thara who implemented the thermal pressure ] May be Thara has an idea about how to aggregate both? There is another series floating around with hardware limiter [1] and the same problematic. [1] https://lkml.org/lkml/2021/6/8/1791 > I'll factor out the h/w limiter from this patchset and prepare the v4. > Thank you all for taking a look at the patches. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog