Hi Christian, When we free an IB fence, we first call one call_rcu in drm_sched_fence_release_scheduled as the call trace one, then after the call trace one, we call the call_rcu second in the amdgpu_fence_release in call trace two, as below. The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as below: 1.drm_sched_entity_fini -> drm_sched_entity_cleanup -> dma_fence_put(entity->last_scheduled) -> drm_sched_fence_release_finished -> drm_sched_fence_release_scheduled -> call_rcu(&fence->finished.rcu, drm_sched_fence_free) 2.drm_sched_fence_free -> dma_fence_put(fence->parent) -> amdgpu_fence_release -> call_rcu(&f->rcu, amdgpu_fence_free) -> kmem_cache_free(amdgpu_fence_slab, fence); > -----Original Message----- > From: Koenig, Christian > Sent: Friday, May 18, 2018 5:46 PM > To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier after entity fini > > Ok, I'm lost where do we use call_rcu() twice? Cause that sounds incorrect in > the first place. > > Christian. > > Am 18.05.2018 um 11:41 schrieb Deng, Emily: > > Ping...... > > > > Best Wishes, > > Emily Deng > > > > > > > > > >> -----Original Message----- > >> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On > >> Behalf Of Deng, Emily > >> Sent: Friday, May 18, 2018 11:20 AM > >> To: Koenig, Christian <Christian.Koenig at amd.com<mailto:Christian.Koenig at amd.com>>; amd- > >> gfx at lists.freedesktop.org<mailto:gfx at lists.freedesktop.org> > >> Subject: RE: [PATCH] drm/amdgpu: add rcu_barrier after entity fini > >> > >> Hi Christian, > >> Yes, it has already one rcu_barrier, but it has called twice > >> call_rcu, so the one rcu_barrier just could barrier one call_rcu some time. > >> After I added another rcu_barrier, the kernel issue will disappear. > >> > >> Best Wishes, > >> Emily Deng > >> > >>> -----Original Message----- > >>> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] > >>> Sent: Thursday, May 17, 2018 7:08 PM > >>> To: Deng, Emily <Emily.Deng at amd.com<mailto:Emily.Deng at amd.com>>; amd-<mailto:amd-gfx at lists.freedesktop.org> > gfx at lists.freedesktop.org > >>> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier after entity fini > >>> > >>> Am 17.05.2018 um 12:03 schrieb Emily Deng: > >>>> To free the fence from the amdgpu_fence_slab, need twice call_rcu, > >>>> to avoid the amdgpu_fence_slab_fini call > >>>> kmem_cache_destroy(amdgpu_fence_slab) before > >>> kmem_cache_free(amdgpu_fence_slab, fence), add rcu_barrier after > >>> drm_sched_entity_fini. > >>>> The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as > below: > >>>> 1.drm_sched_entity_fini -> > >>>> drm_sched_entity_cleanup -> > >>>> dma_fence_put(entity->last_scheduled) -> > >>>> drm_sched_fence_release_finished -> > >>> drm_sched_fence_release_scheduled > >>>> -> call_rcu(&fence->finished.rcu, drm_sched_fence_free) > >>>> > >>>> 2.drm_sched_fence_free -> > >>>> dma_fence_put(fence->parent) -> > >>>> amdgpu_fence_release -> > >>>> call_rcu(&f->rcu, amdgpu_fence_free) -> > >>>> kmem_cache_free(amdgpu_fence_slab, fence); > >>>> > >>>> v2:put the barrier before the kmem_cache_destroy > >>>> > >>>> Change-Id: I8dcadd3372f97e72461bf46b41cc26d90f09b8df > >>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com<mailto:Emily.Deng at amd.com>> > >>>> --- > >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + > >>>> 1 file changed, 1 insertion(+) > >>>> > >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>>> index 39ec6b8..42be65b 100644 > >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > >>>> @@ -69,6 +69,7 @@ int amdgpu_fence_slab_init(void) > >>>> void amdgpu_fence_slab_fini(void) > >>>> { > >>>> rcu_barrier(); > >>>> + rcu_barrier(); > >>> Well, you should have noted that there is already an rcu_barrier > >>> here and adding another one shouldn't have any additional effect. So > >>> your explanation and the proposed solution doesn't make to much > sense. > >>> > >>> I think the problem you run into is rather that the fence is > >>> reference counted and might live longer than the module who created it. > >>> > >>> Complicated issue, one possible solution would be to release > >>> fence->parent earlier in the scheduler fence but that doesn't sound > >>> fence->like > >>> a general purpose solution. > >>> > >>> Christian. > >>> > >>>> kmem_cache_destroy(amdgpu_fence_slab); > >>>> } > >>>> /* > >> _______________________________________________ > >> amd-gfx mailing list > >> amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> > >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180518/5431cd34/attachment.html>