Re: [PATCH 0/5] Fix a deadlock with clk_pm_runtime_get()

Miquel Raynal <miquel.raynal@xxxxxxxxxxx> · Fri, 13 Sep 2024 21:32:19 +0200

Hi Stephen,

I only discover this thread today, interesting read!

sboyd@xxxxxxxxxx wrote on Sun, 24 Mar 2024 22:43:57 -0700:

> This patch series fixes a deadlock reported[1] on ChromeOS devices
> (Qualcomm sc7180 Trogdor). To get there, we allow __clk_release() to run

Not only ChromeOS devices are affected, there have been several reports
with similar issues on the mailing list, especially on i.MX8MP, where
the clock and power management domains are tightly connected.

> without the prepare_lock held. Then we add runtime PM enabled clk_core
> structs to a list that we iterate and enable runtime PM for each entry
> before grabbing the prepare_lock to walk the clk tree. The details are
> in patch #4.

I am happy we ended-up leaning to the same solution: runtime PM calls
should no longer happen after acquiring the prepare lock.

> The patch after that is based on the analysis in the disable unused
> patch. We similarly resume devices from runtime suspend when walking the
> clk tree for the debugfs clk_summary.
> 
> Unfortunately this doesn't fix all problems with the usage of runtime PM
> in the clk framework. We still have a problem if preparing a clk happens
> in parallel to the device providing that clk runtime resuming or
> suspending. In that case, the task will go to sleep waiting for the
> runtime PM state to change, and we'll deadlock. This is primarily a
> problem with the global prepare_lock. I suspect we'll be able to fix
> this by implementing per-clk locking, because then we will be able to
> split up the big prepare_lock into smaller locks that don't deadlock on
> some device runtime PM transitions.

I fear splitting the locks will actually not solve the problems we
encounter on i.MX8.

Let me quote some parts of your commit log in patch 4/5:

---8<---
> This is a classic ABBA deadlock. To properly fix the deadlock, we
> must never runtime PM resume or suspend a device with the clk
> prepare_lock held. Actually doing that is near impossible today
> because the global prepare_lock would have to be dropped in the
> middle of the tree, the device runtime PM resumed/suspended, and then
> the prepare_lock grabbed again to ensure consistency of the clk tree
> topology. If anything changes with the clk tree in the meantime,
> we've lost and will need to start the operation all over again.
>
> Luckily, most of the time we're simply incrementing or decrementing
> the runtime PM count on an active device, so we don't have the chance
> to schedule away with the prepare_lock held. Let's fix this immediate
> problem that can be triggered more easily by simply booting on
> Qualcomm sc7180.
--->8---

Regarding your former statement, I don't think it is impossible, this
is what I've been trying to do recently. It is really impacting, and
must be handled specifically for each situation: I am counting three of
them depending on the action, where either the parents, or the
children or both sides of the tree should be resumed before
continuing (with optionally the new parent, when reparenting explicitly
or doing some rate changes which also involve reparenting). I don't yet
have a working proof-of-concept -I would have loved to before LPC- but
this is promising and I believe doable.

About your second paragraph however, I am asking whether being "most of
the time" incrementing or decrementing the runtime PM count is
acceptable, because if we ever perform a real state change within the
right conditions, we will just deadlock the platform. This is
theoretical and is unlikely to happen in general, I agree. But shall we
consider this situation too unlikely from happening for just ignoring
it? Shall we instead fix it properly and prepare ourselves for future
power-optimized architecture with a lot of dependencies between clocks
and other power-related subsystems? This is an open question.

> I'll start working on that problem in earnest now because I'm worried
> we're going to run into that problem very soon.

Is there any public branch I could look into?

On my side I tried to warn about this but got no feedback. I'd have
loved to be pointed towards this patchset at that time :) Here is the
report if you want to check it out. FYI, I was asking for feedback on
very specific questions, which I consider now solved:
https://lore.kernel.org/all/20240527181928.4fc6b5f0@xps-13/

> Stephen Boyd (5):
>   clk: Remove prepare_lock hold assertion in __clk_release()
>   clk: Don't hold prepare_lock when calling kref_put()
>   clk: Initialize struct clk_core kref earlier
>   clk: Get runtime PM before walking tree during disable_unused
>   clk: Get runtime PM before walking tree for clk_summary
> 
>  drivers/clk/clk.c | 142 +++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 115 insertions(+), 27 deletions(-)
> 
> Cc: Douglas Anderson <dianders@xxxxxxxxxxxx>
> Cc: Krzysztof Kozlowski <krzk@xxxxxxxxxx>
> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Cc: Taniya Das <quic_tdas@xxxxxxxxxxx>
> Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> 
> [1] https://lore.kernel.org/all/20220922084322.RFC.2.I375b6b9e0a0a5348962f004beb3dafee6a12dfbb@changeid/
> 
> base-commit: e8f897f4afef0031fe618a8e94127a0934896aba

Thanks,
Miquèl