Quoting Stephen Boyd (2024-03-25 11:41:58) > Doug reported [1] the following hung task: > > INFO: task swapper/0:1 blocked for more than 122 seconds. > Not tainted 5.15.149-21875-gf795ebc40eb8 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008 > Call trace: > __switch_to+0xf4/0x1f4 > __schedule+0x418/0xb80 > schedule+0x5c/0x10c > rpm_resume+0xe0/0x52c > rpm_resume+0x178/0x52c > __pm_runtime_resume+0x58/0x98 > clk_pm_runtime_get+0x30/0xb0 > clk_disable_unused_subtree+0x58/0x208 > clk_disable_unused_subtree+0x38/0x208 > clk_disable_unused_subtree+0x38/0x208 > clk_disable_unused_subtree+0x38/0x208 > clk_disable_unused_subtree+0x38/0x208 > clk_disable_unused+0x4c/0xe4 > do_one_initcall+0xcc/0x2d8 > do_initcall_level+0xa4/0x148 > do_initcalls+0x5c/0x9c > do_basic_setup+0x24/0x30 > kernel_init_freeable+0xec/0x164 > kernel_init+0x28/0x120 > ret_from_fork+0x10/0x20 > INFO: task kworker/u16:0:9 blocked for more than 122 seconds. > Not tainted 5.15.149-21875-gf795ebc40eb8 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008 > Workqueue: events_unbound deferred_probe_work_func > Call trace: > __switch_to+0xf4/0x1f4 > __schedule+0x418/0xb80 > schedule+0x5c/0x10c > schedule_preempt_disabled+0x2c/0x48 > __mutex_lock+0x238/0x488 > __mutex_lock_slowpath+0x1c/0x28 > mutex_lock+0x50/0x74 > clk_prepare_lock+0x7c/0x9c > clk_core_prepare_lock+0x20/0x44 > clk_prepare+0x24/0x30 > clk_bulk_prepare+0x40/0xb0 > mdss_runtime_resume+0x54/0x1c8 > pm_generic_runtime_resume+0x30/0x44 > __genpd_runtime_resume+0x68/0x7c > genpd_runtime_resume+0x108/0x1f4 > __rpm_callback+0x84/0x144 > rpm_callback+0x30/0x88 > rpm_resume+0x1f4/0x52c > rpm_resume+0x178/0x52c > __pm_runtime_resume+0x58/0x98 > __device_attach+0xe0/0x170 > device_initial_probe+0x1c/0x28 > bus_probe_device+0x3c/0x9c > device_add+0x644/0x814 > mipi_dsi_device_register_full+0xe4/0x170 > devm_mipi_dsi_device_register_full+0x28/0x70 > ti_sn_bridge_probe+0x1dc/0x2c0 > auxiliary_bus_probe+0x4c/0x94 > really_probe+0xcc/0x2c8 > __driver_probe_device+0xa8/0x130 > driver_probe_device+0x48/0x110 > __device_attach_driver+0xa4/0xcc > bus_for_each_drv+0x8c/0xd8 > __device_attach+0xf8/0x170 > device_initial_probe+0x1c/0x28 > bus_probe_device+0x3c/0x9c > deferred_probe_work_func+0x9c/0xd8 > process_one_work+0x148/0x518 > worker_thread+0x138/0x350 > kthread+0x138/0x1e0 > ret_from_fork+0x10/0x20 > > The first thread is walking the clk tree and calling > clk_pm_runtime_get() to power on devices required to read the clk > hardware via struct clk_ops::is_enabled(). This thread holds the clk > prepare_lock, and is trying to runtime PM resume a device, when it finds > that the device is in the process of resuming so the thread schedule()s > away waiting for the device to finish resuming before continuing. The > second thread is runtime PM resuming the same device, but the runtime > resume callback is calling clk_prepare(), trying to grab the > prepare_lock waiting on the first thread. > > This is a classic ABBA deadlock. To properly fix the deadlock, we must > never runtime PM resume or suspend a device with the clk prepare_lock > held. Actually doing that is near impossible today because the global > prepare_lock would have to be dropped in the middle of the tree, the > device runtime PM resumed/suspended, and then the prepare_lock grabbed > again to ensure consistency of the clk tree topology. If anything > changes with the clk tree in the meantime, we've lost and will need to > start the operation all over again. > > Luckily, most of the time we're simply incrementing or decrementing the > runtime PM count on an active device, so we don't have the chance to > schedule away with the prepare_lock held. Let's fix this immediate > problem that can be triggered more easily by simply booting on Qualcomm > sc7180. > > Introduce a list of clk_core structures that have been registered, or > are in the process of being registered, that require runtime PM to > operate. Iterate this list and call clk_pm_runtime_get() on each of them > without holding the prepare_lock during clk_disable_unused(). This way > we can be certain that the runtime PM state of the devices will be > active and resumed so we can't schedule away while walking the clk tree > with the prepare_lock held. Similarly, call clk_pm_runtime_put() without > the prepare_lock held to properly drop the runtime PM reference. We > remove the calls to clk_pm_runtime_{get,put}() in this path because > they're superfluous now that we know the devices are runtime resumed. > > Reported-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1] > Closes: https://issuetracker.google.com/328070191 > Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> > Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > Cc: Krzysztof Kozlowski <krzk@xxxxxxxxxx> > Fixes: 9a34b45397e5 ("clk: Add support for runtime PM") > Signed-off-by: Stephen Boyd <sboyd@xxxxxxxxxx> > --- Applied to clk-fixes