On Tue, 1 Nov 2022 at 03:43, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > Quoting Dmitry Baryshkov (2022-10-27 11:13:44) > > On 27/10/2022 01:18, Stephen Boyd wrote: > > > Reviving this old thread because this commit has lead to a couple bugs > > > now. > > > > > > Quoting Ulf Hansson (2022-06-22 03:26:52) > > >> On Fri, 17 Jun 2022 at 21:58, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > >>> > > >>> Hi Bjorn and Dmitry, > > >>> > > >>> Yu reported a lockdep warning coming from the gdsc driver. It looks like > > >>> the runtime PM usage in gdsc.c is causing lockdep to see an AA deadlock > > >>> possibility with 'genpd->mlock'. I suspect this is because we have > > >>> commit 1b771839de05 ("clk: qcom: gdsc: enable optional power domain > > >>> support"), and that is now calling runtime PM code from within the genpd > > >>> code. > > > > > > This commit has caused a deadlock at boot for Doug[1] and I see that the > > > camera driver on Google CoachZ and Wormdingler devices doesn't work > > > after resuming from suspend once this commit is applied. I'm leaning > > > towards sending a revert, because it seems to cause 3 issues while > > > removing the regulator hack that was in place to enable MMCX. This patch > > > is cleaning up the hack, but trading the hack for three more problems. > > > > > >> I think genpd already has nested lock support, so the only > > >>> solution is to not use runtime PM from within genpd code and start > > >>> expressing genpd parent relationships in genpd itself? > > >> > > >> Not sure exactly what you mean here, but yes, expressing the > > >> parent/child domain relationship is always needed. > > >> > > >> Having gdsc_disable() to do runtime PM calls (gdsc_pm_runtime_put()) > > >> seems awkward to me. Why is that needed, more exactly? > > > > > > It seems like this is needed so that the gdsc_enable() and > > > gdsc_disable() calls can read/write the registers for the genpd, while > > > those registers live in some clk controller that needs either a > > > different clk (probably some AHB clk) or another genpd to be enabled. It > > > looks like for qcom,sm8250-dispcc it relies on MMCX gdsc (which is a > > > genpd). From a hardware view, the MDSS_GDSC provided by the display clk > > > controller is probably not a sub-domain of MMCX. Instead, we need to > > > have MMCX enabled so that we can access the registers for the MDSS GDSC. > > > > Yes, exactly. > > Thanks for confirming. I've debugged further and found that we can't use > runtime PM calls here in the gdsc enable/disable functions at all. What > happens is runtime PM is disabled during device late suspend (see > __device_suspend_late() and how it calls __pm_runtime_disable() early > on). All genpds are assigned noirq phase suspend/resume operations that > will power on and off the genpd during system wide suspend/resume (see > the genpd->domain.ops assignments in pm_genpd_init() and how noirq PM > ops are used). > > When it comes time to resume the system, we'll try to enable the gdsc, > and call into gdsc_enable() which will try to call pm_runtime APIs on > the parent clk controller during the noirq phase of resume and that is > doomed to fail. > > > > > > > > > My question is if it makes sense to simply describe that the GDSCs > > > provided by a device are sub-domains of whatever power domains are > > > listed in DT for that device? I think if we did that here for sm8250 > > > dispcc, we wouldn't need to use runtime PM within the genpd code > > > assuming that the MMCX parent genpd is enabled before we attempt to > > > read/write the dispcc gdsc registers. Hopefully that is also done, i.e. > > > enabling parent domains before enabling child domains if the parent is > > > disabled. > > > > I will check this tomorrow. It should be possible to handle the > > MMCX/MDSS_GDSC relationship in this way. > > > > > Is this already being done with pm_genpd_add_subdomain() in > > > gdsc_register()? I see that we're attaching that to dispcc's struct > > > device::pm_domain, but I assume that is different from the MMCX genpd. > > > > No, I think the only domain there is the MMCX domain, so this call s > > > > Did this get deleted? No, the regulator was deleted. But for some time we had the following in display and video clock controller nodes: power-domains = <&rpmhpd SM8250_MMCX>; required-opps = <&rpmhpd_opp_low_svs>; > > > > Maybe that is the problem here. Dmitry can you further describe the > > > problem being solved? > > > > I must admit, I don't remember what caused me to add this call. May be > > it was added before me getting the pm_genpd_add_subdomain() call in place. > > > > I see that the 'dev->pm_domain' pointer isn't assigned unless I have a > 'power-domains = <&some_pd>' property in DT for the clock controller > registering GDSCs. This means that a driver using the pm clock APIs > isn't populating 'dev->pm_domain' and thus the clks for the device > aren't enabled when this gdsc is enabled. We were looking for the opposite way: to enable the power domain, when the GDSC (and clocks) are accessed. The bus clocks (DISP_CC_XO_CLK, GCC_DISP_AHB_CLK, etc.) are hardwired into the on state forever. > There is a GENPD_FLAG_PM_CLK flag that we could try, but > genpd_resume_noirq() calls genpd_sync_power_on() before calling > genpd_start_dev(), and the device it is trying to start there is the > wrong device, it is the consumer of the gdsc. That looks like a > dead-end. I see. [skipped the pm_clk part. I might be wrong, but I don't think it's relevant. We have to enable the GDSC, but not the clocks to enable the access.] > This problem exposes that the power-domains that we implement for GDSCs > are not "complete". If we listed the clocks required for the domain to > operate properly, then we would be able to attach those to the gdsc's > genpd and have one power domain that represents everything for the > device in DT. Of course, if the device itself had some sort of "freeze" > or "lock" register that caused the device to not work unless you > unlocked it by writing a special value then we would need to also start > the device in the PM domain. I don't see any sort of call to start > parent devices in genpd_resume_noirq(), so I don't know how this would > work. Luckily I don't have this problem today, but I'm just thinking of > how we slice up the genpd code and the device specific code and hook > that all into genpd. > > In a sense, we want to resume the device that is providing the genpd, > but that isn't happening here. Instead, we're noirq resuming a genpd > while the provider of that genpd is still suspended. I think typically > genpds are registered by a bus entity that is "always on" and doesn't > need to power manage itself. That disconnect is where we're getting into > trouble. I think that genpd's are registered in the logical order. So the parents are resumed before the child because they come earlier in the device list. Is my assumption correct? > TL;DR: This patch seems fairly broken. It works only for boot time probe > and then breaks suspend/resume. I think we should revert it and work on > a proper solution. Can we do that? Or are there DT dependencies stacked > on top to move away from the regulator design? No, the regulators are completely removed. I think I still need to send the revert for the fixed-regulator driver, but the code is not used anyway. Just reverting the commit would kill the display & video on all recent platforms. We can mark MMCX as ALWAYS_ON for some time. However I'd like to try if anything is necessary at all to fix this issue. I suppose that setting up the subdomain should be enough, but I'd like to run some thorough tests before declaring the result. -- With best wishes Dmitry