Quoting Dmitry Baryshkov (2022-10-27 11:13:44) > On 27/10/2022 01:18, Stephen Boyd wrote: > > Reviving this old thread because this commit has lead to a couple bugs > > now. > > > > Quoting Ulf Hansson (2022-06-22 03:26:52) > >> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > >>> > >>> Hi Bjorn and Dmitry, > >>> > >>> Yu reported a lockdep warning coming from the gdsc driver. It looks like > >>> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock > >>> possibility with 'genpd->mlock'. I suspect this is because we have > >>> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain > >>> support"), and that is now calling runtime PM code from within the genpd > >>> code. > > > > This commit has caused a deadlock at boot for Doug[1] and I see that the > > camera driver on Google CoachZ and Wormdingler devices doesn't work > > after resuming from suspend once this commit is applied. I'm leaning > > towards sending a revert, because it seems to cause 3 issues while > > removing the regulator hack that was in place to enable MMCX. This patch > > is cleaning up the hack, but trading the hack for three more problems. > > > >> I think genpd already has nested lock support, so the only > >>> solution is to not use runtime PM from within genpd code and start > >>> expressing genpd parent relationships in genpd itself? > >> > >> Not sure exactly what you mean here, but yes, expressing the > >> parent/child domain relationship is always needed. > >> > >> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put()) > >> seems awkward to me. Why is that needed, more exactly? > > > > It seems like this is needed so that the gdsc_enable() and > > gdsc_disable() calls can read/write the registers for the genpd, while > > those registers live in some clk controller that needs either a > > different clk (probably some AHB clk) or another genpd to be enabled. It > > looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a > > genpd). From a hardware view, the MDSS_GDSC provided by the display clk > > controller is probably not a sub-domain of MMCX. Instead, we need to > > have MMCX enabled so that we can access the registers for the MDSS GDSC. > > Yes, exactly. Thanks for confirming. I've debugged further and found that we can't use runtime PM calls here in the gdsc enable/disable functions at all. What happens is runtime PM is disabled during device late suspend (see __device_suspend_late() and how it calls __pm_runtime_disable() early on). All genpds are assigned noirq phase suspend/resume operations that will power on and off the genpd during system wide suspend/resume (see the genpd->domain.ops assignments in pm_genpd_init() and how noirq PM ops are used). When it comes time to resume the system, we'll try to enable the gdsc, and call into gdsc_enable() which will try to call pm_runtime APIs on the parent clk controller during the noirq phase of resume and that is doomed to fail. > > > > > My question is if it makes sense to simply describe that the GDSCs > > provided by a device are sub-domains of whatever power domains are > > listed in DT for that device? I think if we did that here for sm8250 > > dispcc, we wouldn't need to use runtime PM within the genpd code > > assuming that the MMCX parent genpd is enabled before we attempt to > > read/write the dispcc gdsc registers. Hopefully that is also done, i.e. > > enabling parent domains before enabling child domains if the parent is > > disabled. > > I will check this tomorrow. It should be possible to handle the > MMCX/MDSS_GDSC relationship in this way. > > > Is this already being done with pm_genpd_add_subdomain() in > > gdsc_register()? I see that we're attaching that to dispcc's struct > > device::pm_domain, but I assume that is different from the MMCX genpd. > > No, I think the only domain there is the MMCX domain, so this call s > Did this get deleted? > > Maybe that is the problem here. Dmitry can you further describe the > > problem being solved? > > I must admit, I don't remember what caused me to add this call. May be > it was added before me getting the pm_genpd_add_subdomain() call in place. > I see that the 'dev->pm_domain' pointer isn't assigned unless I have a 'power-domains = <&some_pd>' property in DT for the clock controller registering GDSCs. This means that a driver using the pm clock APIs isn't populating 'dev->pm_domain' and thus the clks for the device aren't enabled when this gdsc is enabled. There is a GENPD_FLAG_PM_CLK flag that we could try, but genpd_resume_noirq() calls genpd_sync_power_on() before calling genpd_start_dev(), and the device it is trying to start there is the wrong device, it is the consumer of the gdsc. That looks like a dead-end. We really need some sort of way to mix pm clks with DT power-domains and tie that all up into some struct generic_pm_domain that turns on the DT power-domain along with the clks needed for the device to be accessible. If we had that domain we could then attach the GDSCs registered here as subdomains of that domain so that we didn't use runtime PM APIs. Possibly we can make a struct generic_pm_domain in the drivers that need this and have it call pm_clk_resume() directly and set the domain with dev_pm_domain_set()? I believe that runs into a problem though if we have a power-domain in DT and want to use pm_clks as well. The power-domain from DT will be auto-attached during probe and we don't want to do that. This problem exposes that the power-domains that we implement for GDSCs are not "complete". If we listed the clocks required for the domain to operate properly, then we would be able to attach those to the gdsc's genpd and have one power domain that represents everything for the device in DT. Of course, if the device itself had some sort of "freeze" or "lock" register that caused the device to not work unless you unlocked it by writing a special value then we would need to also start the device in the PM domain. I don't see any sort of call to start parent devices in genpd_resume_noirq(), so I don't know how this would work. Luckily I don't have this problem today, but I'm just thinking of how we slice up the genpd code and the device specific code and hook that all into genpd. In a sense, we want to resume the device that is providing the genpd, but that isn't happening here. Instead, we're noirq resuming a genpd while the provider of that genpd is still suspended. I think typically genpds are registered by a bus entity that is "always on" and doesn't need to power manage itself. That disconnect is where we're getting into trouble. TL;DR: This patch seems fairly broken. It works only for boot time probe and then breaks suspend/resume. I think we should revert it and work on a proper solution. Can we do that? Or are there DT dependencies stacked on top to move away from the regulator design?