On 11/17/22 16:11, Christian König wrote: > Am 17.11.22 um 14:00 schrieb Dmitry Osipenko: >> On 11/17/22 15:59, Dmitry Osipenko wrote: >>> On 11/17/22 15:55, Christian König wrote: >>>> Am 17.11.22 um 13:47 schrieb Dmitry Osipenko: >>>>> On 11/17/22 12:53, Christian König wrote: >>>>>> Am 17.11.22 um 03:36 schrieb Dmitry Osipenko: >>>>>>> Hi, >>>>>>> >>>>>>> On 10/14/22 11:46, Christian König wrote: >>>>>>>> +/* Remove the entity from the scheduler and kill all pending >>>>>>>> jobs */ >>>>>>>> +static void drm_sched_entity_kill(struct drm_sched_entity *entity) >>>>>>>> +{ >>>>>>>> + struct drm_sched_job *job; >>>>>>>> + struct dma_fence *prev; >>>>>>>> + >>>>>>>> + if (!entity->rq) >>>>>>>> + return; >>>>>>>> + >>>>>>>> + spin_lock(&entity->rq_lock); >>>>>>>> + entity->stopped = true; >>>>>>>> + drm_sched_rq_remove_entity(entity->rq, entity); >>>>>>>> + spin_unlock(&entity->rq_lock); >>>>>>>> + >>>>>>>> + /* Make sure this entity is not used by the scheduler at the >>>>>>>> moment */ >>>>>>>> + wait_for_completion(&entity->entity_idle); >>>>>>> I'm always hitting lockup here using Panfrost driver on terminating >>>>>>> Xorg. Revering this patch helps. Any ideas how to fix it? >>>>>>> >>>>>> Well is the entity idle or are there some unsubmitted jobs left? >>>>> Do you mean unsubmitted to h/w? IIUC, there are unsubmitted jobs left. >>>>> >>>>> I see that there are 5-6 incomplete (in-flight) jobs when >>>>> panfrost_job_close() is invoked. >>>>> >>>>> There are 1-2 jobs that are constantly scheduled and finished once >>>>> in a >>>>> few seconds after the lockup happens. >>>> Well what drm_sched_entity_kill() is supposed to do is to prevent >>>> pushing queued up stuff to the hw when the process which queued it is >>>> killed. Is the process really killed or is that just some incorrect >>>> handling? >>> It's actually 5-6 incomplete jobs of Xorg that are hanging when Xorg >>> process is closed. >>> >>> The two re-scheduled jobs are from sddm, so it's only the Xorg context >>> that hangs. >>> >>>> In other words I see two possibilities here, either we have a bug in >>>> the >>>> scheduler or panfrost isn't using it correctly. >>>> >>>> Does panfrost calls drm_sched_entity_flush() before it calls >>>> drm_sched_entity_fini()? (I don't have the driver source at hand at the >>>> moment). >>> Panfrost doesn't use drm_sched_entity_flush(), nor >>> drm_sched_entity_flush(). >> *nor drm_sched_entity_fini() > > Well that would mean that this is *really* buggy! How do you then end up > in drm_sched_entity_kill()? From drm_sched_entity_destroy()? Yes, from drm_sched_entity_destroy(). > drm_sched_entity_flush() should be called from the flush callback from > the file_operations structure of panfrost. See amdgpu_flush() and > amdgpu_ctx_mgr_entity_flush(). This makes sure that we wait for all > entities of the process/file descriptor to be flushed out. > > drm_sched_entity_fini() must be called before you free the memory the > entity structure or otherwise we would run into an use after free. Right, drm_sched_entity_destroy() invokes these two functions and Panfrost uses drm_sched_entity_destroy(). -- Best regards, Dmitry