From: Rob Clark <robdclark@xxxxxxxxxxxx> I've seen a few crashes like: CPU: 0 PID: 216 Comm: A618-worker Tainted: G W 5.4.196 #7 Hardware name: Google Wormdingler rev1+ INX panel board (DT) pstate: 20c00009 (nzCv daif +PAN +UAO) pc : msm_readl+0x14/0x34 lr : a6xx_gpu_busy+0x40/0x80 sp : ffffffc011b93ad0 x29: ffffffc011b93ad0 x28: ffffffe77cba3000 x27: 0000000000000001 x26: ffffffe77bb4c4ac x25: ffffffa2f227dfa0 x24: ffffffa2f22aab28 x23: 0000000000000000 x22: ffffffa2f22bf020 x21: ffffffa2f22bf000 x20: ffffffc011b93b10 x19: ffffffc011bd4110 x18: 000000000000000e x17: 0000000000000004 x16: 000000000000000c x15: 000001be3a969450 x14: 0000000000000400 x13: 00000000000101d6 x12: 0000000034155555 x11: 0000000000000001 x10: 0000000000000000 x9 : 0000000100000000 x8 : ffffffc011bd4000 x7 : 0000000000000000 x6 : 0000000000000007 x5 : ffffffc01d8b38f0 x4 : 0000000000000000 x3 : 00000000ffffffff x2 : 0000000000000002 x1 : 0000000000000000 x0 : ffffffc011bd4110 Call trace: msm_readl+0x14/0x34 a6xx_gpu_busy+0x40/0x80 msm_devfreq_get_dev_status+0x70/0x1d0 devfreq_simple_ondemand_func+0x34/0x100 update_devfreq+0x50/0xe8 qos_notifier_call+0x2c/0x64 qos_max_notifier_call+0x1c/0x2c notifier_call_chain+0x58/0x98 __blocking_notifier_call_chain+0x74/0x84 blocking_notifier_call_chain+0x38/0x48 pm_qos_update_target+0xf8/0x19c freq_qos_apply+0x54/0x6c apply_constraint+0x60/0x104 __dev_pm_qos_update_request+0xb4/0x184 dev_pm_qos_update_request+0x38/0x58 msm_devfreq_idle_work+0x34/0x40 kthread_worker_fn+0x144/0x1c8 kthread+0x140/0x284 ret_from_fork+0x10/0x18 Code: f9000bf3 910003fd aa0003f3 d503201f (b9400260) ---[ end trace f6309767a42d0831 ]--- Which smells a lot like touching hw after power collapse. This seems a bit like a race/timing issue elsewhere, as pm_runtime_get_if_in_use() in a6xx_gpu_busy() should have kept us from touching hw if it wasn't powered. But, we've seen cases where the idle_work scheduled by msm_devfreq_idle() ends up racing with the resume path. Which, again, shouldn't be a problem other than unnecessary freq changes. v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy() Fixes: 9bc95570175a ("drm/msm: Devfreq tuning") Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> Link: https://lore.kernel.org/r/20210927152928.831245-1-robdclark@xxxxxxxxx --- drivers/gpu/drm/msm/msm_gpu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index eb8a6663f309..244511f85044 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -672,7 +672,6 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring, msm_submit_retire(submit); pm_runtime_mark_last_busy(&gpu->pdev->dev); - pm_runtime_put_autosuspend(&gpu->pdev->dev); spin_lock_irqsave(&ring->submit_lock, flags); list_del(&submit->node); @@ -686,6 +685,8 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_ringbuffer *ring, msm_devfreq_idle(gpu); mutex_unlock(&gpu->active_lock); + pm_runtime_put_autosuspend(&gpu->pdev->dev); + msm_gem_submit_put(submit); } -- 2.36.1