On 29.03.2023 19:30, Rob Clark wrote: > On Wed, Mar 29, 2023 at 8:48 AM Konrad Dybcio <konrad.dybcio@xxxxxxxxxx> wrote: >> >> >> >> On 29.03.2023 16:37, Johan Hovold wrote: >>> On Wed, Mar 29, 2023 at 04:04:44PM +0200, Konrad Dybcio wrote: >>>> If we fail to initialize the GPU for whatever reason (say we don't >>>> embed the GPU firmware files in the initrd), the error path involves >>>> pm_runtime_put_sync() which then calls idle() instead of suspend(). >>>> >>>> This is suboptimal, as it means that we're not going through the >>>> clean shutdown sequence. With at least A619_holi, this makes the GPU >>>> not wake up until it goes through at least one more start-fail-stop >>>> cycle. Fix that by using pm_runtime_put_sync_suspend to force a clean >>>> shutdown. >>> >>> This does not sound right. If pm_runtime_put_sync() fails to suspend the >>> device when the usage count drops to zero, then you have a bug somewhere >>> else. >> I was surprised to see that it was not called as well, but I wasn't able >> to track it down before.. >> >>> >>> Also since commit 2c087a336676 ("drm/msm/adreno: Load the firmware >>> before bringing up the hardware") the firmware is loaded before even >>> hitting these paths so the above description does not sound right in >>> that respect either (or is missing some details). >> ..but I did some more digging and I found that the precise "firmware" >> that fails is the ZAP blob, which is not checked like SQE in the >> commit you mentioned! >> >> Now I don't think that we can easily check for it as-is since >> zap_shader_load_mdt() does the entire find-load-authenticate >> dance which is required with secure assets, but it's obviously >> possible to rip out the find-load part of that and go on from >> there. >> >> Do you think that would be a better solution? > > Hmm, to hit this it sounds like you'd need all the fw _except_ the zap > in the initrd? Correct. Konrad > > BR, > -R > >> Konrad >> >>> >>>> Test cases: >>>> 1. firmware baked into kernel >>>> 2. error loading fw in initrd -> load from rootfs at DE start >>>> >>>> Both succeed on A619_holi (SM6375) and A630 (SDM845). >>>> >>>> Fixes: 0d997f95b70f ("drm/msm/adreno: fix runtime PM imbalance at gpu load") >>>> Signed-off-by: Konrad Dybcio <konrad.dybcio@xxxxxxxxxx> >>>> --- >>>> drivers/gpu/drm/msm/adreno/adreno_device.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c >>>> index f61896629be6..59f3302e8167 100644 >>>> --- a/drivers/gpu/drm/msm/adreno/adreno_device.c >>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c >>>> @@ -477,7 +477,7 @@ struct msm_gpu *adreno_load_gpu(struct drm_device *dev) >>>> return gpu; >>>> >>>> err_put_rpm: >>>> - pm_runtime_put_sync(&pdev->dev); >>>> + pm_runtime_put_sync_suspend(&pdev->dev); >>>> err_disable_rpm: >>>> pm_runtime_disable(&pdev->dev); >>> >>> Johan