On Thu, Nov 17, 2022 at 7:12 AM Dmitry Osipenko <dmitry.osipenko@xxxxxxxxxxxxx> wrote: > > On 11/17/22 18:09, Christian König wrote: > > Am 17.11.22 um 15:41 schrieb Dmitry Osipenko: > >> [SNIP] > >>> drm_sched_entity_flush() should be called from the flush callback from > >>> the file_operations structure of panfrost. See amdgpu_flush() and > >>> amdgpu_ctx_mgr_entity_flush(). This makes sure that we wait for all > >>> entities of the process/file descriptor to be flushed out. > >>> > >>> drm_sched_entity_fini() must be called before you free the memory the > >>> entity structure or otherwise we would run into an use after free. > >> Right, drm_sched_entity_destroy() invokes these two functions and > >> Panfrost uses drm_sched_entity_destroy(). > > > > Than I have no idea what's going wrong here. > > > > The scheduler should trivially finish with the entity and call > > complete(&entity->entity_idle) in it's main loop. No idea why this > > doesn't happen. Can you investigate? > > I'll take a closer look. Hoped you may have a quick idea of what's wrong :) > As Jonathan mentioned, the same thing is happening on msm. I can reproduce this by adding an assert in mesa (in this case, triggered after 100 draws) and running an app under gdb. After the assert is hit, if I try to exit mesa, it hangs. The problem is that we somehow call drm_sched_entity_kill() twice. The first time completes, but now the entity_idle completion is no longer done, so the second call hangs forever. BR, -R