Recently, I debugged a few device crashes which occured during recovery after a hangcheck timeout. It looks like there are a few things we can do to improve our chance at a successful gpu recovery. First one is to ensure that CX GDSC collapses which clears the internal states in gpu's CX domain. First 5 patches tries to handle this. Rest of the patches are to ensure that few internal blocks like CP, GMU and GBIF are halted properly before proceeding for a snapshot followed by recovery. Also, handle 'prepare slumber' hfi failure correctly. These are A6x specific improvements. Changes in v2: - Rebased on msm-next tip Akhil P Oommen (7): drm/msm: Remove unnecessary pm_runtime_get/put drm/msm: Correct pm_runtime votes in recover worker drm/msm: Fix cx collapse issue during recovery drm/msm: Ensure cx gdsc collapse during recovery arm64: dts: qcom: sc7280: Update gpu register list drm/msm/a6xx: Improve gpu recovery sequence drm/msm/a6xx: Handle GMU prepare-slumber hfi failure arch/arm64/boot/dts/qcom/sc7280.dtsi | 6 ++- drivers/gpu/drm/msm/adreno/a6xx.xml.h | 4 ++ drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 83 ++++++++++++++++++++++------------- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 36 +++++++++++++-- drivers/gpu/drm/msm/msm_gpu.c | 9 ++-- drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_ringbuffer.c | 4 -- 7 files changed, 100 insertions(+), 43 deletions(-) -- 2.7.4