Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor

Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> · Tue, 15 Jun 2021 18:18:21 +0200

On 15/06/2021 15:01, Dmitry Osipenko wrote:
> 15.06.2021 13:26, Viresh Kumar пишет:
>> On 15-06-21, 12:03, Daniel Lezcano wrote:
>>>
>>> [Cc Viresh]
>>>
>>> On 29/05/2021 19:09, Dmitry Osipenko wrote:
>>>> All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which
>>>> monitors temperature and voltage of the SoC. Sensors control CPU frequency
>>>> throttling, which is activated by hardware once preprogrammed temperature
>>>> level is breached, they also send signal to Power Management controller to
>>>> perform emergency shutdown on a critical overheat of the SoC die. Add
>>>> driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor
>>>> and a cooling device.
>>>
>>> IMO it does not make sense to expose the hardware throttling mechanism
>>> as a cooling device because it is not usable anywhere from the thermal
>>> framework.
>>>
>>> Moreover, that will collide with the thermal / cpufreq framework
>>> mitigation (hardware sets the frequency but the software thinks the freq
>>> is different), right ?
> 
> H/w mitigation is additional and should be transparent to the software
> mitigation. The software mitigation is much more flexible, but it has
> latency. Software also could crash and hang.
> 
> Hardware mitigation doesn't have latency and it will continue to work
> regardless of the software state.

Yes, I agree. Both solutions have their pros and cons. However, I don't
think they can co-exist sanely.

> The CCF driver is aware about the h/w cooling status [1], hence software
> sees the actual frequency.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660

Ah interesting, thanks for the pointer.

What I'm worried about is the consistency with cpufreq.

Probably cpufreq_update_limits() should be called from the interrupt
handler.

>> I am not even sure what the cooling device is doing here:
>>
>> tegra_tsensor_set_cur_state() is not implemented and it says hardware
>> changed it by itself. What is the benefit you are getting out of the
>> cooling device here ?
> 
> It allows userspace to check whether hardware cooling is active via the
> cooling_device sysfs. Otherwise we don't have ability to check whether
> h/w cooling is active, I think it's a useful information. It's also
> interesting to see the cooling_device stats, showing how many times h/w
> mitigation was active.

Actually the stats are for software mitigation. For the driver, create a
debugfs entry like what do the other drivers or a module parameter with
the stats.

>>> The hardware limiter should let know the cpufreq framework about the
>>> frequency change.
>>>
>>> 	https://lkml.org/lkml/2021/6/8/1792
>>>
>>> May be post the sensor without the hw limiter for now and address that
>>> in a separate series ?
>>
> 
> I wasn't aware about existence of the thermal pressure, thank you for
> pointing at it. At a quick glance it should be possible to benefit from
> the information about the additional pressure.
> 
> Seems the current thermal pressure API assumes that there is only one
> user of the API. So it's impossible to aggregate the pressure from
> different sources, like software cpufreq pressure + h/w freq pressure.
> Correct? If yes, then please let me know yours thoughts about the best
> approach of supporting the aggregation.

That is a good question. IMO, first step would be to call
cpufreq_update_limits().

[ Cc Thara who implemented the thermal pressure ]

May be Thara has an idea about how to aggregate both? There is another
series floating around with hardware limiter [1] and the same problematic.

 [1] https://lkml.org/lkml/2021/6/8/1791

> I'll factor out the h/w limiter from this patchset and prepare the v4.
> Thank you all for taking a look at the patches.

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog