Re: [PATCH v3] drm/panfrost: Fix job timeout handling

Steven Price <steven.price@xxxxxxx> · Thu, 8 Oct 2020 11:04:15 +0100

On 02/10/2020 13:25, Boris Brezillon wrote:
If more than two jobs end up timeout-ing concurrently, only one of them
(the one attached to the scheduler acquiring the lock) is fully handled.
The other one remains in a dangling state where it's no longer part of
the scheduling queue, but still blocks something in scheduler, leading
to repetitive timeouts when new jobs are queued.

Let's make sure all bad jobs are properly handled by the thread
acquiring the lock.

v3:
- Add Steven's R-b
- Don't take the sched_lock when stopping the schedulers

v2:
- Fix the subject prefix
- Stop the scheduler before returning from panfrost_job_timedout()
- Call cancel_delayed_work_sync() after drm_sched_stop() to make sure
   no timeout handlers are in flight when we reset the GPU (Steven Price)
- Make sure we release the reset lock before restarting the
   schedulers (Steven Price)

Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver")
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
Reviewed-by: Steven Price <steven.price@xxxxxxx>

Applied to drm-misc-next, thanks!

Steve