Quoting Jordan Crouse (2018-11-19 15:47:06) > 99.999% of the time during normal operation the GMU is responsible > for power and clock control on the GX domain and the CPU remains > blissfully unaware. However, there is one situation where the CPU > needs to get involved: > > The power sequencing rules dictate that the GX needs to be turned > off before the CX so that the CX can be turned on before the GX > during power up. During normal operation when the CPU is taking > down the CX domain a stop command is sent to the GMU which turns > off the GX domain and then the CPU handles the CX domain. > > But if the GMU happened to be unresponsive while the GX domain was > left then the CPU will need to step in and turn off the GX domain ^ left on? > before resetting the CX and rebooting the GMU. This unfortunately > means that the CPU needs to be marginally aware of the GX domain > even though it is expected to usually keep its hands off. > > To support this we create a semi-disabled GX power domain that > does nothing to the hardware on power up but tries to shut it > down normally on power down. In this method the reference counting > is correct and we can step in with the pm_runtime_put() at the right > time during the failure path. > > This patch sets up the connection to the GX power domain and does > the magic to "enable" and disable it at the right points. > > Signed-off-by: Jordan Crouse <jcrouse@xxxxxxxxxxxxxx> The pm_runtime gymnastics is scary! But I'm willing to go with it. > --- > drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 41 ++++++++++++++++++++++++++- > drivers/gpu/drm/msm/adreno/a6xx_gmu.h | 2 ++ > 2 files changed, 42 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > index 51493f409358..ca71709efc94 100644 > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > @@ -1142,9 +1169,15 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu) > if (IS_ERR_OR_NULL(gmu->mmio)) > return; > > - pm_runtime_disable(gmu->dev); > a6xx_gmu_stop(a6xx_gpu); > > + pm_runtime_disable(gmu->dev); > + > + if (!IS_ERR(gmu->gxpd)) { > + pm_runtime_disable(gmu->gxpd); > + dev_pm_domain_detach(gmu->gxpd, false); > + } > + > a6xx_gmu_irq_disable(gmu); > a6xx_gmu_memory_free(gmu, gmu->hfi); > > @@ -1203,6 +1236,12 @@ int a6xx_gmu_probe(struct a6xx_gpu *a6xx_gpu, struct device_node *node) > if (gmu->hfi_irq < 0 || gmu->gmu_irq < 0) > goto err; > > + /* > + * Get a link to the GX power domain to reset the GPU in case of GMU > + * crash > + */ > + gmu->gxpd = dev_pm_domain_attach_by_name(gmu->dev, "gx"); Are there a6xx devices that don't have this genpd? Just curious if we could always assume that if it's an a6xx gpu then this must be there and we can always call pm_runtime_get/put on it and avoid the if (!IS_ERR()) checks. > + > /* Get the power levels for the GMU and GPU */ > a6xx_gmu_pwrlevels_probe(gmu); >