Re: [PATCH 2/2] drm/lima: mask irqs in timeout path before hard reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Reviewed-by: Qiang Yu <yuq825@xxxxxxxxx>

On Tue, Apr 2, 2024 at 5:20 AM Erico Nunes <nunes.erico@xxxxxxxxx> wrote:
>
> There is a race condition in which a rendering job might take just long
> enough to trigger the drm sched job timeout handler but also still
> complete before the hard reset is done by the timeout handler.
> This runs into race conditions not expected by the timeout handler.
> In some very specific cases it currently may result in a refcount
> imbalance on lima_pm_idle, with a stack dump such as:
>
> [10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0
> ...
> [10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0
> ...
> [10136.669628] Call trace:
> [10136.669634]  lima_devfreq_record_idle+0xa0/0xb0
> [10136.669646]  lima_sched_pipe_task_done+0x5c/0xb0
> [10136.669656]  lima_gp_irq_handler+0xa8/0x120
> [10136.669666]  __handle_irq_event_percpu+0x48/0x160
> [10136.669679]  handle_irq_event+0x4c/0xc0
>
> We can prevent that race condition entirely by masking the irqs at the
> beginning of the timeout handler, at which point we give up on waiting
> for that job entirely.
> The irqs will be enabled again at the next hard reset which is already
> done as a recovery by the timeout handler.
>
> Signed-off-by: Erico Nunes <nunes.erico@xxxxxxxxx>
> ---
>  drivers/gpu/drm/lima/lima_sched.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index 66841503a618..bbf3f8feab94 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -430,6 +430,13 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
>                 return DRM_GPU_SCHED_STAT_NOMINAL;
>         }
>
> +       /*
> +        * The task might still finish while this timeout handler runs.
> +        * To prevent a race condition on its completion, mask all irqs
> +        * on the running core until the next hard reset completes.
> +        */
> +       pipe->task_mask_irq(pipe);
> +
>         if (!pipe->error)
>                 DRM_ERROR("%s job timeout\n", lima_ip_name(ip));
>
> --
> 2.44.0
>




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux