Re: clk: qcom: genpd lockdep warning in gdsc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 1 Nov 2022 at 03:43, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
>
> Quoting Dmitry Baryshkov (2022-10-27 11:13:44)
> > On 27/10/2022 01:18, Stephen Boyd wrote:
> > > Reviving this old thread because this commit has lead to a couple bugs
> > > now.
> > >
> > > Quoting Ulf Hansson (2022-06-22 03:26:52)
> > >> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
> > >>>
> > >>> Hi Bjorn and Dmitry,
> > >>>
> > >>> Yu reported a lockdep warning coming from the gdsc driver. It looks like
> > >>> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock
> > >>> possibility with 'genpd->mlock'. I suspect this is because we have
> > >>> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain
> > >>> support"), and that is now calling runtime PM code from within the genpd
> > >>> code.
> > >
> > > This commit has caused a deadlock at boot for Doug[1] and I see that the
> > > camera driver on Google CoachZ and Wormdingler devices doesn't work
> > > after resuming from suspend once this commit is applied. I'm leaning
> > > towards sending a revert, because it seems to cause 3 issues while
> > > removing the regulator hack that was in place to enable MMCX. This patch
> > > is cleaning up the hack, but trading the hack for three more problems.
> > >
> > >> I think genpd already has nested lock support, so the only
> > >>> solution is to not use runtime PM from within genpd code and start
> > >>> expressing genpd parent relationships in genpd itself?
> > >>
> > >> Not sure exactly what you mean here, but yes, expressing the
> > >> parent/child domain relationship is always needed.
> > >>
> > >> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put())
> > >> seems awkward to me. Why is that needed, more exactly?
> > >
> > > It seems like this is needed so that the gdsc_enable() and
> > > gdsc_disable() calls can read/write the registers for the genpd, while
> > > those registers live in some clk controller that needs either a
> > > different clk (probably some AHB clk) or another genpd to be enabled. It
> > > looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a
> > > genpd). From a hardware view, the MDSS_GDSC provided by the display clk
> > > controller is probably not a sub-domain of MMCX. Instead, we need to
> > > have MMCX enabled so that we can access the registers for the MDSS GDSC.
> >
> > Yes, exactly.
>
> Thanks for confirming. I've debugged further and found that we can't use
> runtime PM calls here in the gdsc enable/disable functions at all. What
> happens is runtime PM is disabled during device late suspend (see
> __device_suspend_late() and how it calls __pm_runtime_disable() early
> on). All genpds are assigned noirq phase suspend/resume operations that
> will power on and off the genpd during system wide suspend/resume (see
> the genpd->domain.ops assignments in pm_genpd_init() and how noirq PM
> ops are used).
>
> When it comes time to resume the system, we'll try to enable the gdsc,
> and call into gdsc_enable() which will try to call pm_runtime APIs on
> the parent clk controller during the noirq phase of resume and that is
> doomed to fail.
>
> >
> > >
> > > My question is if it makes sense to simply describe that the GDSCs
> > > provided by a device are sub-domains of whatever power domains are
> > > listed in DT for that device? I think if we did that here for sm8250
> > > dispcc, we wouldn't need to use runtime PM within the genpd code
> > > assuming that the MMCX parent genpd is enabled before we attempt to
> > > read/write the dispcc gdsc registers. Hopefully that is also done, i.e.
> > > enabling parent domains before enabling child domains if the parent is
> > > disabled.
> >
> > I will check this tomorrow. It should be possible to handle the
> > MMCX/MDSS_GDSC relationship in this way.
> >
> > > Is this already being done with pm_genpd_add_subdomain() in
> > > gdsc_register()? I see that we're attaching that to dispcc's struct
> > > device::pm_domain, but I assume that is different from the MMCX genpd.
> >
> > No, I think the only domain there is the MMCX domain, so this call s
> >
>
> Did this get deleted?

No, the regulator was deleted. But for some time we had the following
in display and video clock controller nodes:

power-domains = <&rpmhpd SM8250_MMCX>;
required-opps = <&rpmhpd_opp_low_svs>;

>
> > > Maybe that is the problem here. Dmitry can you further describe the
> > > problem being solved?
> >
> > I must admit, I don't remember what caused me to add this call. May be
> > it was added before me getting the pm_genpd_add_subdomain() call in place.
> >
>
> I see that the 'dev->pm_domain' pointer isn't assigned unless I have a
> 'power-domains = <&some_pd>' property in DT for the clock controller
> registering GDSCs. This means that a driver using the pm clock APIs
> isn't populating 'dev->pm_domain' and thus the clks for the device
> aren't enabled when this gdsc is enabled.

We were looking for the opposite way: to enable the power domain, when
the GDSC (and clocks) are accessed.
The bus clocks (DISP_CC_XO_CLK, GCC_DISP_AHB_CLK, etc.) are hardwired
into the on state forever.

> There is a GENPD_FLAG_PM_CLK flag that we could try, but
> genpd_resume_noirq() calls genpd_sync_power_on() before calling
> genpd_start_dev(), and the device it is trying to start there is the
> wrong device, it is the consumer of the gdsc. That looks like a
> dead-end.

I see.

[skipped the pm_clk part. I might be wrong, but I don't think it's
relevant. We have to enable the GDSC, but not the clocks to enable the
access.]

> This problem exposes that the power-domains that we implement for GDSCs
> are not "complete". If we listed the clocks required for the domain to
> operate properly, then we would be able to attach those to the gdsc's
> genpd and have one power domain that represents everything for the
> device in DT. Of course, if the device itself had some sort of "freeze"
> or "lock" register that caused the device to not work unless you
> unlocked it by writing a special value then we would need to also start
> the device in the PM domain. I don't see any sort of call to start
> parent devices in genpd_resume_noirq(), so I don't know how this would
> work. Luckily I don't have this problem today, but I'm just thinking of
> how we slice up the genpd code and the device specific code and hook
> that all into genpd.
>
> In a sense, we want to resume the device that is providing the genpd,
> but that isn't happening here. Instead, we're noirq resuming a genpd
> while the provider of that genpd is still suspended. I think typically
> genpds are registered by a bus entity that is "always on" and doesn't
> need to power manage itself. That disconnect is where we're getting into
> trouble.

I think that genpd's are registered in the logical order. So the
parents are resumed before the child because they come earlier in the
device list. Is my assumption correct?

> TL;DR: This patch seems fairly broken. It works only for boot time probe
> and then breaks suspend/resume. I think we should revert it and work on
> a proper solution. Can we do that? Or are there DT dependencies stacked
> on top to move away from the regulator design?

No, the regulators are completely removed. I think I still need to
send the revert for the fixed-regulator driver, but the code is not
used anyway.

Just reverting the commit would kill the display & video on all recent
platforms. We can mark MMCX as ALWAYS_ON for some time.
However I'd like to try if anything is necessary at all to fix this
issue. I suppose that setting up the subdomain should be enough, but
I'd like to run some thorough tests before declaring the result.

-- 
With best wishes
Dmitry



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux