On 02/11/2023 14:26, AngeloGioacchino Del Regno wrote: > Even though soft reset should ideally never fail, during development of > some power management features I managed to get some bits wrong: this > resulted in GPU soft reset failures, where the GPU was never able to > recover, not even after suspend/resume cycles, meaning that the only > way to get functionality back was to reboot the machine. > > Perform a hard reset after a soft reset failure to be able to recover > the GPU during runtime (so, without any machine reboot). > > Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx> > --- > drivers/gpu/drm/panfrost/panfrost_gpu.c | 14 ++++++++++---- > drivers/gpu/drm/panfrost/panfrost_regs.h | 1 + > 2 files changed, 11 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c > index fad75b6e543e..7e9e2cf26e4d 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c > +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c > @@ -60,14 +60,20 @@ int panfrost_gpu_soft_reset(struct panfrost_device *pfdev) > > gpu_write(pfdev, GPU_INT_MASK, 0); > gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_RESET_COMPLETED); > - gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET); > > + gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET); > ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT, > val, val & GPU_IRQ_RESET_COMPLETED, 100, 10000); > - I'm not sure what's going on with blank lines above - AFAICT there's no actual change just a blank line being moved. It's best to avoid blank line changes to keep the diff readable. > if (ret) { > - dev_err(pfdev->dev, "gpu soft reset timed out\n"); > - return ret; > + dev_err(pfdev->dev, "gpu soft reset timed out, attempting hard reset\n"); > + > + gpu_write(pfdev, GPU_CMD, GPU_CMD_HARD_RESET); > + ret = readl_relaxed_poll_timeout(pfdev->iomem + GPU_INT_RAWSTAT, > + val, val & GPU_IRQ_RESET_COMPLETED, 100, 10000); NIT: checkpatch complains about the alignment here. Other than the minor comments this looks fine. Hard reset isn't something we want to use (there's a possibility of locking up the system if it occurs during a bus transaction) but it can sometimes recover an otherwise completely locked up GPU. Steve > + if (ret) { > + dev_err(pfdev->dev, "gpu hard reset timed out\n"); > + return ret; > + } > } > > gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL); > diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h > index 55ec807550b3..c25743b05c55 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_regs.h > +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h > @@ -44,6 +44,7 @@ > GPU_IRQ_MULTIPLE_FAULT) > #define GPU_CMD 0x30 > #define GPU_CMD_SOFT_RESET 0x01 > +#define GPU_CMD_HARD_RESET 0x02 > #define GPU_CMD_PERFCNT_CLEAR 0x03 > #define GPU_CMD_PERFCNT_SAMPLE 0x04 > #define GPU_CMD_CYCLE_COUNT_START 0x05