Re: [PATCH v8 00/29] Rework the trip points creation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/10/2022 16:10, Marek Szyprowski wrote:
Hi Daniel,

On 03.10.2022 11:25, Daniel Lezcano wrote:
This work is the pre-requisite of handling correctly when the trip
point are crossed. For that we need to rework how the trip points are
declared and assigned to a thermal zone.

Even if it appears to be a common sense to have the trip points being
ordered, this no guarantee neither documentation telling that is the
case.

One solution could have been to create an ordered array of trips built
when registering the thermal zone by calling the different get_trip*
ops. However those ops receive a thermal zone pointer which is not
known as it is in the process of creating it.

This cyclic dependency shows we have to rework how we manage the trip
points.

Actually, all the trip points definition can be common to the backend
sensor drivers and we can factor out the thermal trip structure in all
of them.

Then, as we register the thermal trips array, they will be available
in the thermal zone structure and a core function can return the trip
given its id.

The get_trip_* ops won't be needed anymore and could be removed. The
resulting code will be another step forward to a self encapsulated
generic thermal framework.

Most of the drivers can be converted more or less easily. This series
does a first round with most of the drivers. Some remain and will be
converted but with a smaller set of changes as the conversion is a bit
more complex.

Changelog:
v8:
- Pretty oneline change and parenthesis removal (Rafael)
- Collected tags
v7:
- Added missing return 0 in the x86_pkg_temp driver
v6:
- Improved the code for the get_crit_temp() function as suggested by
Rafael
- Removed inner parenthesis in the set_trip_temp() function and invert the
conditions. Check the type of the trip point is unchanged
- Folded patch 4 with 1
- Add per thermal zone info message in the bang-bang governor
- Folded the fix for an uninitialized variable in
int340x_thermal_zone_add()
v5:
- Fixed a deadlock when calling thermal_zone_get_trip() while
handling the thermal zone lock
- Remove an extra line in the sysfs change
- Collected tags
v4:
- Remove extra lines on exynos changes as reported by Krzysztof Kozlowski
- Collected tags
v3:
- Reorg the series to be git-bisect safe
- Added the set_trip generic function
- Added the get_crit_temp generic function
- Removed more dead code in the thermal-of
- Fixed the exynos changelog
- Fixed the error check for the exynos drivers
- Collected tags
v2:
- Added missing EXPORT_SYMBOL_GPL() for thermal_zone_get_trip()
- Removed tab whitespace in the acerhdf driver
- Collected tags

Cc: Raju Rangoju <rajur@xxxxxxxxxxx>
Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
Cc: Eric Dumazet <edumazet@xxxxxxxxxx>
Cc: Jakub Kicinski <kuba@xxxxxxxxxx>
Cc: Paolo Abeni <pabeni@xxxxxxxxxx>
Cc: Peter Kaestle <peter@xxxxxxxx>
Cc: Hans de Goede <hdegoede@xxxxxxxxxx>
Cc: Mark Gross <markgross@xxxxxxxxxx>
Cc: Miquel Raynal <miquel.raynal@xxxxxxxxxxx>
Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
Cc: Amit Kucheria <amitk@xxxxxxxxxx>
Cc: Zhang Rui <rui.zhang@xxxxxxxxx>
Cc: Nicolas Saenz Julienne <nsaenz@xxxxxxxxxx>
Cc: Broadcom Kernel Team <bcm-kernel-feedback-list@xxxxxxxxxxxx>
Cc: Florian Fainelli <f.fainelli@xxxxxxxxx>
Cc: Ray Jui <rjui@xxxxxxxxxxxx>
Cc: Scott Branden <sbranden@xxxxxxxxxxxx>
Cc: Support Opensource <support.opensource@xxxxxxxxxxx>
Cc: Lukasz Luba <lukasz.luba@xxxxxxx>
Cc: Shawn Guo <shawnguo@xxxxxxxxxx>
Cc: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>
Cc: Pengutronix Kernel Team <kernel@xxxxxxxxxxxxxx>
Cc: Fabio Estevam <festevam@xxxxxxxxx>
Cc: NXP Linux Team <linux-imx@xxxxxxx>
Cc: Thara Gopinath <thara.gopinath@xxxxxxxxxx>
Cc: Andy Gross <agross@xxxxxxxxxx>
Cc: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx>
Cc: "Niklas Söderlund" <niklas.soderlund@xxxxxxxxxxxx>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx>
Cc: Alim Akhtar <alim.akhtar@xxxxxxxxxxx>
Cc: Thierry Reding <thierry.reding@xxxxxxxxx>
Cc: Jonathan Hunter <jonathanh@xxxxxxxxxx>
Cc: Eduardo Valentin <edubezval@xxxxxxxxx>
Cc: Keerthy <j-keerthy@xxxxxx>
Cc: Kunihiko Hayashi <hayashi.kunihiko@xxxxxxxxxxxxx>
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Antoine Tenart <atenart@xxxxxxxxxx>
Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
Cc: Dmitry Osipenko <digetx@xxxxxxxxx>
Cc: netdev@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: platform-driver-x86@xxxxxxxxxxxxxxx
Cc: linux-pm@xxxxxxxxxxxxxxx
Cc: linux-rpi-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-arm-msm@xxxxxxxxxxxxxxx
Cc: linux-renesas-soc@xxxxxxxxxxxxxxx
Cc: linux-samsung-soc@xxxxxxxxxxxxxxx
Cc: linux-tegra@xxxxxxxxxxxxxxx
Cc: linux-omap@xxxxxxxxxxxxxxx

Daniel Lezcano (29):
thermal/core: Add a generic thermal_zone_get_trip() function
thermal/sysfs: Always expose hysteresis attributes
thermal/core: Add a generic thermal_zone_set_trip() function
thermal/core/governors: Use thermal_zone_get_trip() instead of ops
functions
thermal/of: Use generic thermal_zone_get_trip() function
thermal/of: Remove unused functions
thermal/drivers/exynos: Use generic thermal_zone_get_trip() function
thermal/drivers/exynos: of_thermal_get_ntrips()
thermal/drivers/exynos: Replace of_thermal_is_trip_valid() by
thermal_zone_get_trip()
thermal/drivers/tegra: Use generic thermal_zone_get_trip() function
thermal/drivers/uniphier: Use generic thermal_zone_get_trip() function
thermal/drivers/hisi: Use generic thermal_zone_get_trip() function
thermal/drivers/qcom: Use generic thermal_zone_get_trip() function
thermal/drivers/armada: Use generic thermal_zone_get_trip() function
thermal/drivers/rcar_gen3: Use the generic function to get the number
of trips
thermal/of: Remove of_thermal_get_ntrips()
thermal/of: Remove of_thermal_is_trip_valid()
thermal/of: Remove of_thermal_set_trip_hyst()
thermal/of: Remove of_thermal_get_crit_temp()
thermal/drivers/st: Use generic trip points
thermal/drivers/imx: Use generic thermal_zone_get_trip() function
thermal/drivers/rcar: Use generic thermal_zone_get_trip() function
thermal/drivers/broadcom: Use generic thermal_zone_get_trip() function
thermal/drivers/da9062: Use generic thermal_zone_get_trip() function
thermal/drivers/ti: Remove unused macros ti_thermal_get_trip_value() /
ti_thermal_trip_is_valid()
thermal/drivers/acerhdf: Use generic thermal_zone_get_trip() function
thermal/drivers/cxgb4: Use generic thermal_zone_get_trip() function
thermal/intel/int340x: Replace parameter to simplify
thermal/drivers/intel: Use generic thermal_zone_get_trip() function

I've tested this v8 patchset after fixing the issue with Exynos TMU with
https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@xxxxxxxxxx/
patch and I got the following lockdep warning on all Exynos-based boards:


======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted
------------------------------------------------------
swapper/0/1 is trying to acquire lock:
c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8

but task is already holding lock:
c2979b94 (&tz->lock){+.+.}-{3:3}, at:
thermal_zone_device_update.part.0+0x3c/0x528

which lock already depends on the new lock.

I'm wondering if the problem is not already there and related to data->lock ...

Doesn't the thermal zone lock already prevent racy access to the data structure?

Another question: if the sensor clock is disabled after reading it, how does the hardware update the temperature and detect the programed threshold is crossed?

the existing dependency chain (in reverse order) is:

-> #1 (&tz->lock){+.+.}-{3:3}:
         mutex_lock_nested+0x1c/0x24
         thermal_zone_get_trip+0x20/0x44
         exynos_tmu_initialize+0x144/0x1e0
         exynos_tmu_probe+0x2b0/0x728
         platform_probe+0x5c/0xb8
         really_probe+0xe0/0x414
         __driver_probe_device+0xa0/0x208
         driver_probe_device+0x30/0xc0
         __driver_attach+0xf0/0x1f0
         bus_for_each_dev+0x70/0xb0
         bus_add_driver+0x174/0x218
         driver_register+0x88/0x11c
         do_one_initcall+0x64/0x380
         kernel_init_freeable+0x1c0/0x224
         kernel_init+0x18/0x12c
         ret_from_fork+0x14/0x2c
         0x0

-> #0 (&data->lock#2){+.+.}-{3:3}:
         lock_acquire+0x124/0x3e4
         __mutex_lock+0x90/0x948
         mutex_lock_nested+0x1c/0x24
         exynos_get_temp+0x3c/0xc8
         __thermal_zone_get_temp+0x5c/0x12c
         thermal_zone_device_update.part.0+0x78/0x528
         __thermal_cooling_device_register.part.0+0x298/0x354
         __cpufreq_cooling_register.constprop.0+0x138/0x218
         of_cpufreq_cooling_register+0x48/0x8c
         cpufreq_online+0x8d0/0xb2c
         cpufreq_add_dev+0xb0/0xec
         subsys_interface_register+0x108/0x118
         cpufreq_register_driver+0x15c/0x380
         dt_cpufreq_probe+0x2e4/0x434
         platform_probe+0x5c/0xb8
         really_probe+0xe0/0x414
         __driver_probe_device+0xa0/0x208
         driver_probe_device+0x30/0xc0
         __driver_attach+0xf0/0x1f0
         bus_for_each_dev+0x70/0xb0
         bus_add_driver+0x174/0x218
         driver_register+0x88/0x11c
         do_one_initcall+0x64/0x380
         kernel_init_freeable+0x1c0/0x224
         kernel_init+0x18/0x12c
         ret_from_fork+0x14/0x2c
         0x0

other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&tz->lock);
                                 lock(&data->lock#2);
                                 lock(&tz->lock);
    lock(&data->lock#2);

   *** DEADLOCK ***

5 locks held by swapper/0/1:
   #0: c1c8648c (&dev->mutex){....}-{3:3}, at: __driver_attach+0xe4/0x1f0
   #1: c1210434 (cpu_hotplug_lock){++++}-{0:0}, at:
cpufreq_register_driver+0xc4/0x380
   #2: c1ed8298 (subsys mutex#8){+.+.}-{3:3}, at:
subsys_interface_register+0x4c/0x118
   #3: c131f944 (thermal_list_lock){+.+.}-{3:3}, at:
__thermal_cooling_device_register.part.0+0x238/0x354
   #4: c2979b94 (&tz->lock){+.+.}-{3:3}, at:
thermal_zone_device_update.part.0+0x3c/0x528

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00083-ge5c9d117223e
#12945
Hardware name: Samsung Exynos (Flattened Device Tree)
   unwind_backtrace from show_stack+0x10/0x14
   show_stack from dump_stack_lvl+0x58/0x70
   dump_stack_lvl from check_noncircular+0xf0/0x158
   check_noncircular from __lock_acquire+0x15e8/0x2a7c
   __lock_acquire from lock_acquire+0x124/0x3e4
   lock_acquire from __mutex_lock+0x90/0x948
   __mutex_lock from mutex_lock_nested+0x1c/0x24
   mutex_lock_nested from exynos_get_temp+0x3c/0xc8
   exynos_get_temp from __thermal_zone_get_temp+0x5c/0x12c
   __thermal_zone_get_temp from thermal_zone_device_update.part.0+0x78/0x528
   thermal_zone_device_update.part.0 from
__thermal_cooling_device_register.part.0+0x298/0x354
   __thermal_cooling_device_register.part.0 from
__cpufreq_cooling_register.constprop.0+0x138/0x218
   __cpufreq_cooling_register.constprop.0 from
of_cpufreq_cooling_register+0x48/0x8c
   of_cpufreq_cooling_register from cpufreq_online+0x8d0/0xb2c
   cpufreq_online from cpufreq_add_dev+0xb0/0xec
   cpufreq_add_dev from subsys_interface_register+0x108/0x118
   subsys_interface_register from cpufreq_register_driver+0x15c/0x380
   cpufreq_register_driver from dt_cpufreq_probe+0x2e4/0x434
   dt_cpufreq_probe from platform_probe+0x5c/0xb8
   platform_probe from really_probe+0xe0/0x414
   really_probe from __driver_probe_device+0xa0/0x208
   __driver_probe_device from driver_probe_device+0x30/0xc0
   driver_probe_device from __driver_attach+0xf0/0x1f0
   __driver_attach from bus_for_each_dev+0x70/0xb0
   bus_for_each_dev from bus_add_driver+0x174/0x218
   bus_add_driver from driver_register+0x88/0x11c
   driver_register from do_one_initcall+0x64/0x380
   do_one_initcall from kernel_init_freeable+0x1c0/0x224
   kernel_init_freeable from kernel_init+0x18/0x12c
   kernel_init from ret_from_fork+0x14/0x2c
Exception stack(0xf082dfb0 to 0xf082dff8)
...

Let me know if You need anything more to test.


drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 -
.../ethernet/chelsio/cxgb4/cxgb4_thermal.c | 41 +----
drivers/platform/x86/acerhdf.c | 73 +++-----
drivers/thermal/armada_thermal.c | 39 ++---
drivers/thermal/broadcom/bcm2835_thermal.c | 8 +-
drivers/thermal/da9062-thermal.c | 52 +-----
drivers/thermal/gov_bang_bang.c | 39 +++--
drivers/thermal/gov_fair_share.c | 18 +-
drivers/thermal/gov_power_allocator.c | 51 +++---
drivers/thermal/gov_step_wise.c | 22 ++-
drivers/thermal/hisi_thermal.c | 11 +-
drivers/thermal/imx_thermal.c | 72 +++-----
.../int340x_thermal/int340x_thermal_zone.c | 33 ++--
.../int340x_thermal/int340x_thermal_zone.h | 4 +-
.../processor_thermal_device.c | 10 +-
drivers/thermal/intel/x86_pkg_temp_thermal.c | 120 +++++++------
drivers/thermal/qcom/qcom-spmi-temp-alarm.c | 39 ++---
drivers/thermal/rcar_gen3_thermal.c | 2 +-
drivers/thermal/rcar_thermal.c | 53 +-----
drivers/thermal/samsung/exynos_tmu.c | 57 +++----
drivers/thermal/st/st_thermal.c | 47 +----
drivers/thermal/tegra/soctherm.c | 33 ++--
drivers/thermal/tegra/tegra30-tsensor.c | 17 +-
drivers/thermal/thermal_core.c | 160 +++++++++++++++---
drivers/thermal/thermal_core.h | 24 +--
drivers/thermal/thermal_helpers.c | 28 +--
drivers/thermal/thermal_netlink.c | 21 +--
drivers/thermal/thermal_of.c | 116 -------------
drivers/thermal/thermal_sysfs.c | 133 +++++----------
drivers/thermal/ti-soc-thermal/ti-thermal.h | 15 --
drivers/thermal/uniphier_thermal.c | 27 ++-
include/linux/thermal.h | 10 ++
32 files changed, 559 insertions(+), 818 deletions(-)

Best regards



--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux