Ok, I'm lost where do we use call_rcu() twice? Cause that sounds incorrect in the first place. Christian. Am 18.05.2018 um 11:41 schrieb Deng, Emily: > Ping...... > > Best Wishes, > Emily Deng > > > > >> -----Original Message----- >> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf >> Of Deng, Emily >> Sent: Friday, May 18, 2018 11:20 AM >> To: Koenig, Christian <Christian.Koenig at amd.com>; amd- >> gfx at lists.freedesktop.org >> Subject: RE: [PATCH] drm/amdgpu: add rcu_barrier after entity fini >> >> Hi Christian, >> Yes, it has already one rcu_barrier, but it has called twice call_rcu, so the >> one rcu_barrier just could barrier one call_rcu some time. >> After I added another rcu_barrier, the kernel issue will disappear. >> >> Best Wishes, >> Emily Deng >> >>> -----Original Message----- >>> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] >>> Sent: Thursday, May 17, 2018 7:08 PM >>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier after entity fini >>> >>> Am 17.05.2018 um 12:03 schrieb Emily Deng: >>>> To free the fence from the amdgpu_fence_slab, need twice call_rcu, >>>> to avoid the amdgpu_fence_slab_fini call >>>> kmem_cache_destroy(amdgpu_fence_slab) before >>> kmem_cache_free(amdgpu_fence_slab, fence), add rcu_barrier after >>> drm_sched_entity_fini. >>>> The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as below: >>>> 1.drm_sched_entity_fini -> >>>> drm_sched_entity_cleanup -> >>>> dma_fence_put(entity->last_scheduled) -> >>>> drm_sched_fence_release_finished -> >>> drm_sched_fence_release_scheduled >>>> -> call_rcu(&fence->finished.rcu, drm_sched_fence_free) >>>> >>>> 2.drm_sched_fence_free -> >>>> dma_fence_put(fence->parent) -> >>>> amdgpu_fence_release -> >>>> call_rcu(&f->rcu, amdgpu_fence_free) -> >>>> kmem_cache_free(amdgpu_fence_slab, fence); >>>> >>>> v2:put the barrier before the kmem_cache_destroy >>>> >>>> Change-Id: I8dcadd3372f97e72461bf46b41cc26d90f09b8df >>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> index 39ec6b8..42be65b 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >>>> @@ -69,6 +69,7 @@ int amdgpu_fence_slab_init(void) >>>> void amdgpu_fence_slab_fini(void) >>>> { >>>> rcu_barrier(); >>>> + rcu_barrier(); >>> Well, you should have noted that there is already an rcu_barrier here >>> and adding another one shouldn't have any additional effect. So your >>> explanation and the proposed solution doesn't make to much sense. >>> >>> I think the problem you run into is rather that the fence is reference >>> counted and might live longer than the module who created it. >>> >>> Complicated issue, one possible solution would be to release >>> fence->parent earlier in the scheduler fence but that doesn't sound >>> fence->like >>> a general purpose solution. >>> >>> Christian. >>> >>>> kmem_cache_destroy(amdgpu_fence_slab); >>>> } >>>> /* >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx