Am 18.04.2018 um 11:00 schrieb Liu, Monk: > > 1.Drm_sched_entity_fini(): it exit right after entity->job_queue > empty, [ but that time scheduler is not fast enough to deal with this > entity now ] > > That should never happen. > > The last job from the entity->job_queue is only removed after the > scheduler is done with the entity (at least that was the original > idea, not sure if that still works as expected). > > [ML] no thatâ??s not true and we already catch the kernel NULL pointer > issue with a entity->last_scheduled fence get double put , exactly > like the way I described in the scenario â?¦ > > Pixel already fixed it by moving the put/get pair on > entity->last_scheduled prior to spsc_queue_pop() and the race issue is > therefore avoided > Yeah, already seen and reviewed that. That's a good catch, please make sure that this gets pushed to amd-staging-drm-next ASAP. Christian. > /Monk > > *From:*Christian König [mailto:ckoenig.leichtzumerken at gmail.com] > *Sent:* 2018å¹´4æ??18æ?¥16:36 > *To:* Liu, Monk <Monk.Liu at amd.com>; Koenig, Christian > <Christian.Koenig at amd.com>; Deng, Emily <Emily.Deng at amd.com> > *Cc:* amd-gfx at lists.freedesktop.org > *Subject:* Re: force app kill patch > > See that in â??sched_entity_finiâ??, we only call > dma_fence_put(entity->last_scheduledâ?? under the condition of â??If > (entity->fini_status)â??, so > > This way there is memory leak for the case of â??entity->fini_stats ==0â?? > > Good catch, we indeed should fix that. > > > 1.Drm_sched_entity_fini(): it exit right after entity->job_queue > empty, [ but that time scheduler is not fast enough to deal with > this entity now ] > > That should never happen. > > The last job from the entity->job_queue is only removed after the > scheduler is done with the entity (at least that was the original > idea, not sure if that still works as expected). > > Regards, > Christian. > > Am 18.04.2018 um 09:20 schrieb Liu, Monk: > > *Correctio for the scenario * > > After we move fence_put(entity->last_sched) out of the fini_status > check: > > A potential race issue for the scenario: > > 1.Drm_sched_entity_fini(): it exit right after entity->job_queue > empty, [ but that time scheduler is not fast enough to deal with > this entity now ] > > 2.Drm_sched_entity_cleanup() : it call > dma_fence_put(entity->last_scheduled)  [ but this time > entity->last_scheduled actually points to the fence prior to the > real last one ] > > 3.Scheduler_main() now dealing with this entity: it call > dma_fence_put(entity->last_scheduled)   [  Now this fence get > double put !!!  ] > > 4.Scheduler_main() now call dma_fence_get() on the *real* last one ! > > So eventually the real last one fence triggers memory leak and > more critical the double put fence cause NULL pointer access > > /Monk > > *From:*Liu, Monk > *Sent:* 2018å¹´4æ??18æ?¥15:11 > *To:* Koenig, Christian <Christian.Koenig at amd.com> > <mailto:Christian.Koenig at amd.com>; Deng, Emily > <Emily.Deng at amd.com> <mailto:Emily.Deng at amd.com> > *Cc:* amd-gfx at lists.freedesktop.org > <mailto:amd-gfx at lists.freedesktop.org> > *Subject:* force app kill patch > > Hi Christian & Emily > > I think the v4 fix for â??fix force app kill hangâ??is still not good > enough: > > First: > > See that in â??sched_entity_finiâ??, we only call > dma_fence_put(entity->last_scheduledâ??under the condition of â??If > (entity->fini_status)â??, so > > This way there is memory leak for the case of â??entity->fini_stats ==0â?? > > Second: > > If we move dma_fence_put(entity->last_scheduled) out of the > condition of â??if (entity->fini_status)â??, the memory leak issue can > be fixed > > But there will be kernel NULL pointer access, I think the time you > call dma_fence_put(entity->last_scheduledâ??) may actually executed > **not** > > On the last scheduled fence of this entity, because it is run > without â??thread_park/unparkâ??pair which to make sure scheduler not > dealing this entity > > So with certain race issue, here is the scenario: > > 1.scheduler is doing the dma_fence_put() on the 1^st fence, > > 2.scheduler set entity->last_scheduled to 1^st fence > > 3.now sched_entity_fini() run, and it call dma_fence_put() on > entity->last_scheduled > > 4.now this 1^st fence is actually put double time and the real > last fence wonâ??t get put by expected > > any idea? > > /Monk > > > > > _______________________________________________ > > amd-gfx mailing list > > amd-gfx at lists.freedesktop.org <mailto:amd-gfx at lists.freedesktop.org> > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180418/b89b9f33/attachment-0001.html>