Re: [PATCH 0/7] thermal: enhancements on thermal stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi Eduardo,

On 19/05/2023 05:27, Eduardo Valentin wrote:
Hello Rafael and Daniel

After a long hiatus, I am returning to more frequent contributions
to the thermal subsystems, as least until I drain some of the
commits I have in my trees.

This is a first series of several that will come as improvements
on the thermal subsystem that will enable using this subsystem
in the Baseboard Management Controller (BMC) space, as part
of the Nitro BMC project. To do so, there were a few improvements
and new features wrote.

In this series in particular, I present a set of enhancements
on how we are handling statistics. The cooling device stats
are awesome, but I added a few new entries there. I also
introduce stats per thermal zone here too.

From my POV, that kind of information belongs to debugfs. sysfs is not suitable for that.

The cdev stats are a total mess because of the page size limitation of sysfs and the explosion of the combination when there are a large number of states (eg. display is 1024 cooling device states resulting in a matrix of 1024 x 1024, so more than 4MB of memory).

For the record, I'm working on such of statistics [1][2], and optimized this cooling device statistics in order to get ride of the existing sysfs cdev stats.

Actually, all the stats rely on the mitigation episodes. However, for that we need to correctly identify when they begin and when they end. We can have mitigation episode inside mitigation episode (eg. passive mitigation@trip0 and active mitigation@trip1).

This is not working today because the trip point detection is incorrect, thus the mitigation episodes are also incorrect, consequently the stats are de facto incorrect.

There is more details at [3] but the change assumes the trip points are ordered in the ascending order which is wrong, that is why it was not merged.

The mitigation works but the detection is fuzzy, so the math is inaccurate and as we are in the boundaries of a temperature limit, the resulting statistics do not show us the interesting information to optimize the governors when they are not totally inconsistent.

All the work around the generic trip points is to fix that.

There is a proposal at LPC to add statistic/debug information for thermal, may be you can participate so we join our efforts?

  -- Daniel

[1] https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/trip-crossed%2bdebugfs

[2] https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/debugfs-v2

[3] https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/commit/?h=thermal/trip-crossed%2bdebugfs&id=7d713a9128ad9a153de9c3f5b854c6f1acfb3064



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux