[PATCH] drm/amdgpu: add rcu_barrier after entity fini

christian.koenig@xxxxxxx (Christian König) · Fri, 18 May 2018 11:45:53 +0200

Ok, I'm lost where do we use call_rcu() twice? Cause that sounds 
incorrect in the first place.

Christian.

Am 18.05.2018 um 11:41 schrieb Deng, Emily:
> Ping......
>
> Best Wishes,
> Emily Deng
>
>
>
>
>> -----Original Message-----
>> From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf
>> Of Deng, Emily
>> Sent: Friday, May 18, 2018 11:20 AM
>> To: Koenig, Christian <Christian.Koenig at amd.com>; amd-
>> gfx at lists.freedesktop.org
>> Subject: RE: [PATCH] drm/amdgpu: add rcu_barrier after entity fini
>>
>> Hi Christian,
>>       Yes, it has already one rcu_barrier, but it has called twice call_rcu, so the
>> one rcu_barrier just could barrier one call_rcu some time.
>>      After I added another rcu_barrier, the kernel issue will disappear.
>>
>> Best Wishes,
>> Emily Deng
>>
>>> -----Original Message-----
>>> From: Christian KÃ¶nig [mailto:ckoenig.leichtzumerken at gmail.com]
>>> Sent: Thursday, May 17, 2018 7:08 PM
>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: add rcu_barrier after entity fini
>>>
>>> Am 17.05.2018 um 12:03 schrieb Emily Deng:
>>>> To free the fence from the amdgpu_fence_slab, need twice call_rcu,
>>>> to avoid the amdgpu_fence_slab_fini call
>>>> kmem_cache_destroy(amdgpu_fence_slab) before
>>> kmem_cache_free(amdgpu_fence_slab, fence), add rcu_barrier after
>>> drm_sched_entity_fini.
>>>> The kmem_cache_free(amdgpu_fence_slab, fence)'s call trace as below:
>>>> 1.drm_sched_entity_fini ->
>>>> drm_sched_entity_cleanup ->
>>>> dma_fence_put(entity->last_scheduled) ->
>>>> drm_sched_fence_release_finished ->
>>> drm_sched_fence_release_scheduled
>>>> -> call_rcu(&fence->finished.rcu, drm_sched_fence_free)
>>>>
>>>> 2.drm_sched_fence_free ->
>>>> dma_fence_put(fence->parent) ->
>>>> amdgpu_fence_release ->
>>>> call_rcu(&f->rcu, amdgpu_fence_free) ->
>>>> kmem_cache_free(amdgpu_fence_slab, fence);
>>>>
>>>> v2:put the barrier before the kmem_cache_destroy
>>>>
>>>> Change-Id: I8dcadd3372f97e72461bf46b41cc26d90f09b8df
>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 39ec6b8..42be65b 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -69,6 +69,7 @@ int amdgpu_fence_slab_init(void)
>>>>    void amdgpu_fence_slab_fini(void)
>>>>    {
>>>>    	rcu_barrier();
>>>> +	rcu_barrier();
>>> Well, you should have noted that there is already an rcu_barrier here
>>> and adding another one shouldn't have any additional effect. So your
>>> explanation and the proposed solution doesn't make to much sense.
>>>
>>> I think the problem you run into is rather that the fence is reference
>>> counted and might live longer than the module who created it.
>>>
>>> Complicated issue, one possible solution would be to release
>>> fence->parent earlier in the scheduler fence but that doesn't sound
>>> fence->like
>>> a general purpose solution.
>>>
>>> Christian.
>>>
>>>>    	kmem_cache_destroy(amdgpu_fence_slab);
>>>>    }
>>>>    /*
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx