Hi Daniel, On 03.10.2022 11:25, Daniel Lezcano wrote: > This work is the pre-requisite of handling correctly when the trip > point are crossed. For that we need to rework how the trip points are > declared and assigned to a thermal zone. > > Even if it appears to be a common sense to have the trip points being > ordered, this no guarantee neither documentation telling that is the > case. > > One solution could have been to create an ordered array of trips built > when registering the thermal zone by calling the different get_trip* > ops. However those ops receive a thermal zone pointer which is not > known as it is in the process of creating it. > > This cyclic dependency shows we have to rework how we manage the trip > points. > > Actually, all the trip points definition can be common to the backend > sensor drivers and we can factor out the thermal trip structure in all > of them. > > Then, as we register the thermal trips array, they will be available > in the thermal zone structure and a core function can return the trip > given its id. > > The get_trip_* ops won't be needed anymore and could be removed. The > resulting code will be another step forward to a self encapsulated > generic thermal framework. > > Most of the drivers can be converted more or less easily. This series > does a first round with most of the drivers. Some remain and will be > converted but with a smaller set of changes as the conversion is a bit > more complex. > > Changelog: > v8: > - Pretty oneline change and parenthesis removal (Rafael) > - Collected tags > v7: > - Added missing return 0 in the x86_pkg_temp driver > v6: > - Improved the code for the get_crit_temp() function as suggested by > Rafael > - Removed inner parenthesis in the set_trip_temp() function and invert the > conditions. Check the type of the trip point is unchanged > - Folded patch 4 with 1 > - Add per thermal zone info message in the bang-bang governor > - Folded the fix for an uninitialized variable in > int340x_thermal_zone_add() > v5: > - Fixed a deadlock when calling thermal_zone_get_trip() while > handling the thermal zone lock > - Remove an extra line in the sysfs change > - Collected tags > v4: > - Remove extra lines on exynos changes as reported by Krzysztof Kozlowski > - Collected tags > v3: > - Reorg the series to be git-bisect safe > - Added the set_trip generic function > - Added the get_crit_temp generic function > - Removed more dead code in the thermal-of > - Fixed the exynos changelog > - Fixed the error check for the exynos drivers > - Collected tags > v2: > - Added missing EXPORT_SYMBOL_GPL() for thermal_zone_get_trip() > - Removed tab whitespace in the acerhdf driver > - Collected tags > > Cc: Raju Rangoju <rajur@xxxxxxxxxxx> > Cc: "David S. Miller" <davem@xxxxxxxxxxxxx> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > Cc: Jakub Kicinski <kuba@xxxxxxxxxx> > Cc: Paolo Abeni <pabeni@xxxxxxxxxx> > Cc: Peter Kaestle <peter@xxxxxxxx> > Cc: Hans de Goede <hdegoede@xxxxxxxxxx> > Cc: Mark Gross <markgross@xxxxxxxxxx> > Cc: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx> > Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx> > Cc: Amit Kucheria <amitk@xxxxxxxxxx> > Cc: Zhang Rui <rui.zhang@xxxxxxxxx> > Cc: Nicolas Saenz Julienne <nsaenz@xxxxxxxxxx> > Cc: Broadcom Kernel Team <bcm-kernel-feedback-list@xxxxxxxxxxxx> > Cc: Florian Fainelli <f.fainelli@xxxxxxxxx> > Cc: Ray Jui <rjui@xxxxxxxxxxxx> > Cc: Scott Branden <sbranden@xxxxxxxxxxxx> > Cc: Support Opensource <support.opensource@xxxxxxxxxxx> > Cc: Lukasz Luba <lukasz.luba@xxxxxxx> > Cc: Shawn Guo <shawnguo@xxxxxxxxxx> > Cc: Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> > Cc: Pengutronix Kernel Team <kernel@xxxxxxxxxxxxxx> > Cc: Fabio Estevam <festevam@xxxxxxxxx> > Cc: NXP Linux Team <linux-imx@xxxxxxx> > Cc: Thara Gopinath <thara.gopinath@xxxxxxxxxx> > Cc: Andy Gross <agross@xxxxxxxxxx> > Cc: Bjorn Andersson <bjorn.andersson@xxxxxxxxxx> > Cc: "Niklas Söderlund" <niklas.soderlund@xxxxxxxxxxxx> > Cc: Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx> > Cc: Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx> > Cc: Alim Akhtar <alim.akhtar@xxxxxxxxxxx> > Cc: Thierry Reding <thierry.reding@xxxxxxxxx> > Cc: Jonathan Hunter <jonathanh@xxxxxxxxxx> > Cc: Eduardo Valentin <edubezval@xxxxxxxxx> > Cc: Keerthy <j-keerthy@xxxxxx> > Cc: Kunihiko Hayashi <hayashi.kunihiko@xxxxxxxxxxxxx> > Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx> > Cc: Antoine Tenart <atenart@xxxxxxxxxx> > Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx> > Cc: Dmitry Osipenko <digetx@xxxxxxxxx> > Cc: netdev@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > Cc: platform-driver-x86@xxxxxxxxxxxxxxx > Cc: linux-pm@xxxxxxxxxxxxxxx > Cc: linux-rpi-kernel@xxxxxxxxxxxxxxxxxxx > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > Cc: linux-arm-msm@xxxxxxxxxxxxxxx > Cc: linux-renesas-soc@xxxxxxxxxxxxxxx > Cc: linux-samsung-soc@xxxxxxxxxxxxxxx > Cc: linux-tegra@xxxxxxxxxxxxxxx > Cc: linux-omap@xxxxxxxxxxxxxxx > > Daniel Lezcano (29): > thermal/core: Add a generic thermal_zone_get_trip() function > thermal/sysfs: Always expose hysteresis attributes > thermal/core: Add a generic thermal_zone_set_trip() function > thermal/core/governors: Use thermal_zone_get_trip() instead of ops > functions > thermal/of: Use generic thermal_zone_get_trip() function > thermal/of: Remove unused functions > thermal/drivers/exynos: Use generic thermal_zone_get_trip() function > thermal/drivers/exynos: of_thermal_get_ntrips() > thermal/drivers/exynos: Replace of_thermal_is_trip_valid() by > thermal_zone_get_trip() > thermal/drivers/tegra: Use generic thermal_zone_get_trip() function > thermal/drivers/uniphier: Use generic thermal_zone_get_trip() function > thermal/drivers/hisi: Use generic thermal_zone_get_trip() function > thermal/drivers/qcom: Use generic thermal_zone_get_trip() function > thermal/drivers/armada: Use generic thermal_zone_get_trip() function > thermal/drivers/rcar_gen3: Use the generic function to get the number > of trips > thermal/of: Remove of_thermal_get_ntrips() > thermal/of: Remove of_thermal_is_trip_valid() > thermal/of: Remove of_thermal_set_trip_hyst() > thermal/of: Remove of_thermal_get_crit_temp() > thermal/drivers/st: Use generic trip points > thermal/drivers/imx: Use generic thermal_zone_get_trip() function > thermal/drivers/rcar: Use generic thermal_zone_get_trip() function > thermal/drivers/broadcom: Use generic thermal_zone_get_trip() function > thermal/drivers/da9062: Use generic thermal_zone_get_trip() function > thermal/drivers/ti: Remove unused macros ti_thermal_get_trip_value() / > ti_thermal_trip_is_valid() > thermal/drivers/acerhdf: Use generic thermal_zone_get_trip() function > thermal/drivers/cxgb4: Use generic thermal_zone_get_trip() function > thermal/intel/int340x: Replace parameter to simplify > thermal/drivers/intel: Use generic thermal_zone_get_trip() function I've tested this v8 patchset after fixing the issue with Exynos TMU with https://lore.kernel.org/all/20221003132943.1383065-1-daniel.lezcano@xxxxxxxxxx/ patch and I got the following lockdep warning on all Exynos-based boards: ====================================================== WARNING: possible circular locking dependency detected 6.0.0-rc1-00083-ge5c9d117223e #12945 Not tainted ------------------------------------------------------ swapper/0/1 is trying to acquire lock: c1ce66b0 (&data->lock#2){+.+.}-{3:3}, at: exynos_get_temp+0x3c/0xc8 but task is already holding lock: c2979b94 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_device_update.part.0+0x3c/0x528 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&tz->lock){+.+.}-{3:3}: mutex_lock_nested+0x1c/0x24 thermal_zone_get_trip+0x20/0x44 exynos_tmu_initialize+0x144/0x1e0 exynos_tmu_probe+0x2b0/0x728 platform_probe+0x5c/0xb8 really_probe+0xe0/0x414 __driver_probe_device+0xa0/0x208 driver_probe_device+0x30/0xc0 __driver_attach+0xf0/0x1f0 bus_for_each_dev+0x70/0xb0 bus_add_driver+0x174/0x218 driver_register+0x88/0x11c do_one_initcall+0x64/0x380 kernel_init_freeable+0x1c0/0x224 kernel_init+0x18/0x12c ret_from_fork+0x14/0x2c 0x0 -> #0 (&data->lock#2){+.+.}-{3:3}: lock_acquire+0x124/0x3e4 __mutex_lock+0x90/0x948 mutex_lock_nested+0x1c/0x24 exynos_get_temp+0x3c/0xc8 __thermal_zone_get_temp+0x5c/0x12c thermal_zone_device_update.part.0+0x78/0x528 __thermal_cooling_device_register.part.0+0x298/0x354 __cpufreq_cooling_register.constprop.0+0x138/0x218 of_cpufreq_cooling_register+0x48/0x8c cpufreq_online+0x8d0/0xb2c cpufreq_add_dev+0xb0/0xec subsys_interface_register+0x108/0x118 cpufreq_register_driver+0x15c/0x380 dt_cpufreq_probe+0x2e4/0x434 platform_probe+0x5c/0xb8 really_probe+0xe0/0x414 __driver_probe_device+0xa0/0x208 driver_probe_device+0x30/0xc0 __driver_attach+0xf0/0x1f0 bus_for_each_dev+0x70/0xb0 bus_add_driver+0x174/0x218 driver_register+0x88/0x11c do_one_initcall+0x64/0x380 kernel_init_freeable+0x1c0/0x224 kernel_init+0x18/0x12c ret_from_fork+0x14/0x2c 0x0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&tz->lock); lock(&data->lock#2); lock(&tz->lock); lock(&data->lock#2); *** DEADLOCK *** 5 locks held by swapper/0/1: #0: c1c8648c (&dev->mutex){....}-{3:3}, at: __driver_attach+0xe4/0x1f0 #1: c1210434 (cpu_hotplug_lock){++++}-{0:0}, at: cpufreq_register_driver+0xc4/0x380 #2: c1ed8298 (subsys mutex#8){+.+.}-{3:3}, at: subsys_interface_register+0x4c/0x118 #3: c131f944 (thermal_list_lock){+.+.}-{3:3}, at: __thermal_cooling_device_register.part.0+0x238/0x354 #4: c2979b94 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_device_update.part.0+0x3c/0x528 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc1-00083-ge5c9d117223e #12945 Hardware name: Samsung Exynos (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from check_noncircular+0xf0/0x158 check_noncircular from __lock_acquire+0x15e8/0x2a7c __lock_acquire from lock_acquire+0x124/0x3e4 lock_acquire from __mutex_lock+0x90/0x948 __mutex_lock from mutex_lock_nested+0x1c/0x24 mutex_lock_nested from exynos_get_temp+0x3c/0xc8 exynos_get_temp from __thermal_zone_get_temp+0x5c/0x12c __thermal_zone_get_temp from thermal_zone_device_update.part.0+0x78/0x528 thermal_zone_device_update.part.0 from __thermal_cooling_device_register.part.0+0x298/0x354 __thermal_cooling_device_register.part.0 from __cpufreq_cooling_register.constprop.0+0x138/0x218 __cpufreq_cooling_register.constprop.0 from of_cpufreq_cooling_register+0x48/0x8c of_cpufreq_cooling_register from cpufreq_online+0x8d0/0xb2c cpufreq_online from cpufreq_add_dev+0xb0/0xec cpufreq_add_dev from subsys_interface_register+0x108/0x118 subsys_interface_register from cpufreq_register_driver+0x15c/0x380 cpufreq_register_driver from dt_cpufreq_probe+0x2e4/0x434 dt_cpufreq_probe from platform_probe+0x5c/0xb8 platform_probe from really_probe+0xe0/0x414 really_probe from __driver_probe_device+0xa0/0x208 __driver_probe_device from driver_probe_device+0x30/0xc0 driver_probe_device from __driver_attach+0xf0/0x1f0 __driver_attach from bus_for_each_dev+0x70/0xb0 bus_for_each_dev from bus_add_driver+0x174/0x218 bus_add_driver from driver_register+0x88/0x11c driver_register from do_one_initcall+0x64/0x380 do_one_initcall from kernel_init_freeable+0x1c0/0x224 kernel_init_freeable from kernel_init+0x18/0x12c kernel_init from ret_from_fork+0x14/0x2c Exception stack(0xf082dfb0 to 0xf082dff8) ... Let me know if You need anything more to test. > drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 2 - > .../ethernet/chelsio/cxgb4/cxgb4_thermal.c | 41 +---- > drivers/platform/x86/acerhdf.c | 73 +++----- > drivers/thermal/armada_thermal.c | 39 ++--- > drivers/thermal/broadcom/bcm2835_thermal.c | 8 +- > drivers/thermal/da9062-thermal.c | 52 +----- > drivers/thermal/gov_bang_bang.c | 39 +++-- > drivers/thermal/gov_fair_share.c | 18 +- > drivers/thermal/gov_power_allocator.c | 51 +++--- > drivers/thermal/gov_step_wise.c | 22 ++- > drivers/thermal/hisi_thermal.c | 11 +- > drivers/thermal/imx_thermal.c | 72 +++----- > .../int340x_thermal/int340x_thermal_zone.c | 33 ++-- > .../int340x_thermal/int340x_thermal_zone.h | 4 +- > .../processor_thermal_device.c | 10 +- > drivers/thermal/intel/x86_pkg_temp_thermal.c | 120 +++++++------ > drivers/thermal/qcom/qcom-spmi-temp-alarm.c | 39 ++--- > drivers/thermal/rcar_gen3_thermal.c | 2 +- > drivers/thermal/rcar_thermal.c | 53 +----- > drivers/thermal/samsung/exynos_tmu.c | 57 +++---- > drivers/thermal/st/st_thermal.c | 47 +---- > drivers/thermal/tegra/soctherm.c | 33 ++-- > drivers/thermal/tegra/tegra30-tsensor.c | 17 +- > drivers/thermal/thermal_core.c | 160 +++++++++++++++--- > drivers/thermal/thermal_core.h | 24 +-- > drivers/thermal/thermal_helpers.c | 28 +-- > drivers/thermal/thermal_netlink.c | 21 +-- > drivers/thermal/thermal_of.c | 116 ------------- > drivers/thermal/thermal_sysfs.c | 133 +++++---------- > drivers/thermal/ti-soc-thermal/ti-thermal.h | 15 -- > drivers/thermal/uniphier_thermal.c | 27 ++- > include/linux/thermal.h | 10 ++ > 32 files changed, 559 insertions(+), 818 deletions(-) > Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland