So this is caused by same job trigger both done and timeout handling? I think a better way to solve this is to make sure only one handler (done or timeout) process the job instead of just making lima_pm_idle() unique. Regards, Qiang On Wed, Jan 17, 2024 at 11:12 AM Erico Nunes <nunes.erico@xxxxxxxxx> wrote: > > In case a task manages to complete but it took just long enough to also > trigger the timeout handler, the current code results in a refcount > imbalance on lima_pm_idle. > > While this can be a rare occurrence, when it happens it may fill user > logs with stack traces such as: > > [10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0 > ... > [10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0 > ... > [10136.669628] Call trace: > [10136.669634] lima_devfreq_record_idle+0xa0/0xb0 > [10136.669646] lima_sched_pipe_task_done+0x5c/0xb0 > [10136.669656] lima_gp_irq_handler+0xa8/0x120 > [10136.669666] __handle_irq_event_percpu+0x48/0x160 > [10136.669679] handle_irq_event+0x4c/0xc0 > > The imbalance happens because lima_sched_pipe_task_done() already calls > lima_pm_idle for this case if there was no error. > Check the error flag in the timeout handler to ensure we can never run > into this case. > > Signed-off-by: Erico Nunes <nunes.erico@xxxxxxxxx> > --- > drivers/gpu/drm/lima/lima_sched.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c > index c3bf8cda8498..66317296d831 100644 > --- a/drivers/gpu/drm/lima/lima_sched.c > +++ b/drivers/gpu/drm/lima/lima_sched.c > @@ -427,7 +427,8 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job > pipe->current_vm = NULL; > pipe->current_task = NULL; > > - lima_pm_idle(ldev); > + if (pipe->error) > + lima_pm_idle(ldev); > > drm_sched_resubmit_jobs(&pipe->base); > drm_sched_start(&pipe->base, true); > -- > 2.43.0 >