*Correctio for the scenario * After we move fence_put(entity->last_sched) out of the fini_status check: A potential race issue for the scenario: 1. Drm_sched_entity_fini(): it exit right after entity->job_queue empty, [ but that time scheduler is not fast enough to deal with this entity now ] 2. Drm_sched_entity_cleanup() : it call dma_fence_put(entity->last_scheduled) [ but this time entity->last_scheduled actually points to the fence prior to the real last one ] 3. Scheduler_main() now dealing with this entity: it call dma_fence_put(entity->last_scheduled) [ Now this fence get double put !!! ] 4. Scheduler_main() now call dma_fence_get() on the *real* last one ! So eventually the real last one fence triggers memory leak and more critical the double put fence cause NULL pointer access /Monk From: Liu, Monk Sent: 2018å¹´4æ??18æ?¥ 15:11 To: Koenig, Christian <Christian.Koenig at amd.com>; Deng, Emily <Emily.Deng at amd.com> Cc: amd-gfx at lists.freedesktop.org Subject: force app kill patch Hi Christian & Emily I think the v4 fix for â??fix force app kill hangâ?? is still not good enough: First: See that in â??sched_entity_finiâ??, we only call dma_fence_put(entity->last_scheduledâ?? under the condition of â??If (entity->fini_status)â??, so This way there is memory leak for the case of â??entity->fini_stats ==0â?? Second: If we move dma_fence_put(entity->last_scheduled) out of the condition of â??if (entity->fini_status)â??, the memory leak issue can be fixed But there will be kernel NULL pointer access, I think the time you call dma_fence_put(entity->last_scheduledâ??) may actually executed *not* On the last scheduled fence of this entity, because it is run without â??thread_park/unparkâ?? pair which to make sure scheduler not dealing this entity So with certain race issue, here is the scenario: 1. scheduler is doing the dma_fence_put() on the 1st fence, 2. scheduler set entity->last_scheduled to 1st fence 3. now sched_entity_fini() run, and it call dma_fence_put() on entity->last_scheduled 4. now this 1st fence is actually put double time and the real last fence wonâ??t get put by expected any idea? /Monk -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180418/c8b33ffe/attachment.html>