On Thu, Jan 18, 2024 at 7:14 PM Erico Nunes <nunes.erico@xxxxxxxxx> wrote: > > On Thu, Jan 18, 2024 at 2:36 AM Qiang Yu <yuq825@xxxxxxxxx> wrote: > > > > So this is caused by same job trigger both done and timeout handling? > > I think a better way to solve this is to make sure only one handler > > (done or timeout) process the job instead of just making lima_pm_idle() > > unique. > > It's not very clear to me how to best ensure that, with the drm_sched > software timeout and the irq happening potentially at the same time. This could be done by stopping scheduler run more job and disable GP/PP interrupt. Then after sync irq, there should be no more new irq gets in when we handling timeout. > I think patch 4 in this series describes and covers the most common > case that this would be hit. So maybe now this patch could be dropped > in favour of just that one. Yes. > But since this was a bit hard to reproduce and I'm not sure the issue > is entirely covered by that, I just decided to keep this small change > as it prevented all the stack trace reproducers I was able to come up > with. > > Erico