Re: [PATCH 12/13] drm/scheduler: rework entity flush, kill and fini

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 17.11.22 um 14:00 schrieb Dmitry Osipenko:
On 11/17/22 15:59, Dmitry Osipenko wrote:
On 11/17/22 15:55, Christian König wrote:
Am 17.11.22 um 13:47 schrieb Dmitry Osipenko:
On 11/17/22 12:53, Christian König wrote:
Am 17.11.22 um 03:36 schrieb Dmitry Osipenko:
Hi,

On 10/14/22 11:46, Christian König wrote:
+/* Remove the entity from the scheduler and kill all pending jobs */
+static void drm_sched_entity_kill(struct drm_sched_entity *entity)
+{
+    struct drm_sched_job *job;
+    struct dma_fence *prev;
+
+    if (!entity->rq)
+        return;
+
+    spin_lock(&entity->rq_lock);
+    entity->stopped = true;
+    drm_sched_rq_remove_entity(entity->rq, entity);
+    spin_unlock(&entity->rq_lock);
+
+    /* Make sure this entity is not used by the scheduler at the
moment */
+    wait_for_completion(&entity->entity_idle);
I'm always hitting lockup here using Panfrost driver on terminating
Xorg. Revering this patch helps. Any ideas how to fix it?

Well is the entity idle or are there some unsubmitted jobs left?
Do you mean unsubmitted to h/w? IIUC, there are unsubmitted jobs left.

I see that there are 5-6 incomplete (in-flight) jobs when
panfrost_job_close() is invoked.

There are 1-2 jobs that are constantly scheduled and finished once in a
few seconds after the lockup happens.
Well what drm_sched_entity_kill() is supposed to do is to prevent
pushing queued up stuff to the hw when the process which queued it is
killed. Is the process really killed or is that just some incorrect
handling?
It's actually 5-6 incomplete jobs of Xorg that are hanging when Xorg
process is closed.

The two re-scheduled jobs are from sddm, so it's only the Xorg context
that hangs.

In other words I see two possibilities here, either we have a bug in the
scheduler or panfrost isn't using it correctly.

Does panfrost calls drm_sched_entity_flush() before it calls
drm_sched_entity_fini()? (I don't have the driver source at hand at the
moment).
Panfrost doesn't use drm_sched_entity_flush(), nor drm_sched_entity_flush().
*nor drm_sched_entity_fini()

Well that would mean that this is *really* buggy! How do you then end up in drm_sched_entity_kill()? From drm_sched_entity_destroy()?

drm_sched_entity_flush() should be called from the flush callback from the file_operations structure of panfrost. See amdgpu_flush() and amdgpu_ctx_mgr_entity_flush(). This makes sure that we wait for all entities of the process/file descriptor to be flushed out.

drm_sched_entity_fini() must be called before you free the memory the entity structure or otherwise we would run into an use after free.

Regards,
Christian.



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux