Re: [PATCH v2 4/5] clk: Get runtime PM before walking tree during disable_unused

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Tue, 9 Apr 2024 12:32:04 +0200

On Mon, 8 Apr 2024 at 04:36, Stephen Boyd <sboyd@xxxxxxxxxx> wrote:
>
> Quoting Stephen Boyd (2024-03-25 11:41:58)
> > Doug reported [1] the following hung task:
> >
> >  INFO: task swapper/0:1 blocked for more than 122 seconds.
> >        Not tainted 5.15.149-21875-gf795ebc40eb8 #1
> >  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >  task:swapper/0       state:D stack:    0 pid:    1 ppid:     0 flags:0x00000008
> >  Call trace:
> >   __switch_to+0xf4/0x1f4
> >   __schedule+0x418/0xb80
> >   schedule+0x5c/0x10c
> >   rpm_resume+0xe0/0x52c
> >   rpm_resume+0x178/0x52c
> >   __pm_runtime_resume+0x58/0x98
> >   clk_pm_runtime_get+0x30/0xb0
> >   clk_disable_unused_subtree+0x58/0x208
> >   clk_disable_unused_subtree+0x38/0x208
> >   clk_disable_unused_subtree+0x38/0x208
> >   clk_disable_unused_subtree+0x38/0x208
> >   clk_disable_unused_subtree+0x38/0x208
> >   clk_disable_unused+0x4c/0xe4
> >   do_one_initcall+0xcc/0x2d8
> >   do_initcall_level+0xa4/0x148
> >   do_initcalls+0x5c/0x9c
> >   do_basic_setup+0x24/0x30
> >   kernel_init_freeable+0xec/0x164
> >   kernel_init+0x28/0x120
> >   ret_from_fork+0x10/0x20
> >  INFO: task kworker/u16:0:9 blocked for more than 122 seconds.
> >        Not tainted 5.15.149-21875-gf795ebc40eb8 #1
> >  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >  task:kworker/u16:0   state:D stack:    0 pid:    9 ppid:     2 flags:0x00000008
> >  Workqueue: events_unbound deferred_probe_work_func
> >  Call trace:
> >   __switch_to+0xf4/0x1f4
> >   __schedule+0x418/0xb80
> >   schedule+0x5c/0x10c
> >   schedule_preempt_disabled+0x2c/0x48
> >   __mutex_lock+0x238/0x488
> >   __mutex_lock_slowpath+0x1c/0x28
> >   mutex_lock+0x50/0x74
> >   clk_prepare_lock+0x7c/0x9c
> >   clk_core_prepare_lock+0x20/0x44
> >   clk_prepare+0x24/0x30
> >   clk_bulk_prepare+0x40/0xb0
> >   mdss_runtime_resume+0x54/0x1c8
> >   pm_generic_runtime_resume+0x30/0x44
> >   __genpd_runtime_resume+0x68/0x7c
> >   genpd_runtime_resume+0x108/0x1f4
> >   __rpm_callback+0x84/0x144
> >   rpm_callback+0x30/0x88
> >   rpm_resume+0x1f4/0x52c
> >   rpm_resume+0x178/0x52c
> >   __pm_runtime_resume+0x58/0x98
> >   __device_attach+0xe0/0x170
> >   device_initial_probe+0x1c/0x28
> >   bus_probe_device+0x3c/0x9c
> >   device_add+0x644/0x814
> >   mipi_dsi_device_register_full+0xe4/0x170
> >   devm_mipi_dsi_device_register_full+0x28/0x70
> >   ti_sn_bridge_probe+0x1dc/0x2c0
> >   auxiliary_bus_probe+0x4c/0x94
> >   really_probe+0xcc/0x2c8
> >   __driver_probe_device+0xa8/0x130
> >   driver_probe_device+0x48/0x110
> >   __device_attach_driver+0xa4/0xcc
> >   bus_for_each_drv+0x8c/0xd8
> >   __device_attach+0xf8/0x170
> >   device_initial_probe+0x1c/0x28
> >   bus_probe_device+0x3c/0x9c
> >   deferred_probe_work_func+0x9c/0xd8
> >   process_one_work+0x148/0x518
> >   worker_thread+0x138/0x350
> >   kthread+0x138/0x1e0
> >   ret_from_fork+0x10/0x20
> >
> > The first thread is walking the clk tree and calling
> > clk_pm_runtime_get() to power on devices required to read the clk
> > hardware via struct clk_ops::is_enabled(). This thread holds the clk
> > prepare_lock, and is trying to runtime PM resume a device, when it finds
> > that the device is in the process of resuming so the thread schedule()s
> > away waiting for the device to finish resuming before continuing. The
> > second thread is runtime PM resuming the same device, but the runtime
> > resume callback is calling clk_prepare(), trying to grab the
> > prepare_lock waiting on the first thread.
> >
> > This is a classic ABBA deadlock. To properly fix the deadlock, we must
> > never runtime PM resume or suspend a device with the clk prepare_lock
> > held. Actually doing that is near impossible today because the global
> > prepare_lock would have to be dropped in the middle of the tree, the
> > device runtime PM resumed/suspended, and then the prepare_lock grabbed
> > again to ensure consistency of the clk tree topology. If anything
> > changes with the clk tree in the meantime, we've lost and will need to
> > start the operation all over again.
> >
> > Luckily, most of the time we're simply incrementing or decrementing the
> > runtime PM count on an active device, so we don't have the chance to
> > schedule away with the prepare_lock held. Let's fix this immediate
> > problem that can be triggered more easily by simply booting on Qualcomm
> > sc7180.
> >
> > Introduce a list of clk_core structures that have been registered, or
> > are in the process of being registered, that require runtime PM to
> > operate. Iterate this list and call clk_pm_runtime_get() on each of them
> > without holding the prepare_lock during clk_disable_unused(). This way
> > we can be certain that the runtime PM state of the devices will be
> > active and resumed so we can't schedule away while walking the clk tree
> > with the prepare_lock held. Similarly, call clk_pm_runtime_put() without
> > the prepare_lock held to properly drop the runtime PM reference. We
> > remove the calls to clk_pm_runtime_{get,put}() in this path because
> > they're superfluous now that we know the devices are runtime resumed.
> >
> > Reported-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> > Closes: https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/ [1]
> > Closes: https://issuetracker.google.com/328070191
> > Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> > Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> > Cc: Krzysztof Kozlowski <krzk@xxxxxxxxxx>
> > Fixes: 9a34b45397e5 ("clk: Add support for runtime PM")
> > Signed-off-by: Stephen Boyd <sboyd@xxxxxxxxxx>
> > ---
>
> Applied to clk-fixes

Apologies for not being able to review this, it got lost in my email
filters. Looks like you manage to solve the locking order for the clk
disable unused thing - great!

However I think the main problem we are seeing with these kind of
locking issues is that we are holding a global lock while calling into
pm_runtime_get|put*(). Similar problems have also been reported in the
past. It's been on my todo list for quite some time to have a closer
look, but I haven't reached it yet.

Without going into too much detail, let me just ask a related
question. Would it not be possible to call pm_runtime_get/put() within
the clock framework, without *always* keeping the clock prepare lock
acquired? I assume a clock can't be unregistered, as long as there is
reference taken for it, right? Wouldn't that be a sufficient guarantee
that it's okay to runtime_resume|suspend its corresponding device?

Or maybe I should just send a patch. :-)

Kind regards
Uffe