On Tue, Apr 12, 2022 at 9:41 PM Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> wrote: > > > On 2022-04-12 14:20, Dmitry Osipenko wrote: > > On 4/12/22 19:51, Andrey Grodzovsky wrote: > >> On 2022-04-11 18:15, Dmitry Osipenko wrote: > >>> Interrupt context can't sleep. Drivers like Panfrost and MSM are taking > >>> mutex when job is released, and thus, that code can sleep. This results > >>> into "BUG: scheduling while atomic" if locks are contented while job is > >>> freed. There is no good reason for releasing scheduler's jobs in IRQ > >>> context, hence use normal context to fix the trouble. > >> > >> I am not sure this is the beast Idea to leave job's sw fence signalling > >> to be > >> executed in system_wq context which is prone to delays of executing > >> various work items from around the system. Seems better to me to leave the > >> fence signaling within the IRQ context and offload only the job freeing or, > >> maybe handle rescheduling to thread context within drivers implemention > >> of .free_job cb. Not really sure which is the better. > > We're talking here about killing jobs when driver destroys context, > > which doesn't feel like it needs to be a fast path. I could move the > > signalling into drm_sched_entity_kill_jobs_cb() and use unbound wq, but > > do we really need this for a slow path? > > > You can't move the signaling back to drm_sched_entity_kill_jobs_cb > since this will bring back the lockdep splat that 'drm/sched: Avoid > lockdep spalt on killing a processes' > was fixing. > > I see your point and i guess we can go this way too. Another way would > be to add to > panfrost and msm job a work_item and reschedule to thread context from > within their > .free_job callbacks but that probably to cumbersome to be justified here. FWIW since this mentioned individual drivers, commit 'drm/sched: Avoid lockdep spalt on killing a processes' also introduced problems for lima. There were some occurrences in our CI https://gitlab.freedesktop.org/mesa/mesa/-/jobs/20980982/raw . Later I found it also reproducible on normal usage when just closing applications, so it may be affecting users too. I tested this patch and looks like it fixes things for lima. Thanks Erico