[PATCH 1/2] drm/ttm: set ttm_buffer_object pointer as null after it's freed

tom.stdenis@xxxxxxx (Tom St Denis) · Mon, 10 Sep 2018 09:05:55 -0400

On 2018-09-10 9:04 a.m., Christian KÃ¶nig wrote:
> Hi Tom,
> 
> I'm talking about adding new printks to figure out what the heck is 
> going wrong here.
> 
> Thanks,
> Christian.

Hi Christian,

Sure, if you want to send me a simple patch that adds more printk I'll 
gladly give it a try (doubly so since my workstation depends on our 
staging tree to work properly...).

Tom

> 
> Am 10.09.2018 um 14:59 schrieb Tom St Denis:
>> Hi Christian,
>>
>> Are you adding new traces or turning on existing ones?Â  Would you like 
>> me to try them out in my setup?
>>
>> Tom
>>
>> On 2018-09-10 8:49 a.m., Christian KÃ¶nig wrote:
>>> Am 10.09.2018 um 14:05 schrieb Huang Rui:
>>>> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote:
>>>>> Am 10.09.2018 um 11:23 schrieb Huang Rui:
>>>>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian KÃ¶nig wrote:
>>>>>>> Hi Ray,
>>>>>>>
>>>>>>> well those patches doesn't make sense, the pointer is only local to
>>>>>>> the function.
>>>>>> You're right.
>>>>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the
>>>>>> use-after-free should be in below codes:
>>>>>>
>>>>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT];
>>>>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false);
>>>>>>
>>>>>> Is there a case, when orignal bo is destroyed in the bulk pos, but it
>>>>>> doesn't update pos->first pointer, then we still use it during the 
>>>>>> bulk
>>>>>> moving?
>>>>> Only when a per VM BO is freed or the VM destroyed.
>>>>>
>>>>> The first case should now be handled by "drm/amdgpu: set bulk_moveable
>>>>> to false when a per VM is released" and when we use a destroyed VM we
>>>>> would see other problems as well.
>>>>>
>>>> If a VM instance is teared down, all BOs which belong that VM should be
>>>> removed from LRU. But how can we submit cmd based on a destroyed VM? 
>>>> You
>>>> know, we do the bulk move at last step of submission.
>>>
>>> Well exactly that's the point this can't happen :)
>>>
>>> Otherwise we would crash because of using freed up memory much 
>>> earlier in the command submission.
>>>
>>> The best idea I have to track this down further is to add some 
>>> trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and see 
>>> why and when we are actually using a destroyed BO.
>>>
>>> Christian.
>>>
>>>>
>>>>
>>>> Thanks,
>>>> Ray
>>>>
>>>>> BTW: Just pushed this commit to the repository, should show up any 
>>>>> second.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Thanks,
>>>>>> Ray
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui:
>>>>>>>> It avoids to be refered again after freed.
>>>>>>>>
>>>>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>>>>>>> Cc: Christian KÃ¶nig <christian.koenig at amd.com>
>>>>>>>> Cc: Tom StDenis <Tom.StDenis at amd.com>
>>>>>>>> ---
>>>>>>>> Â Â  drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>>>>>>>> Â Â  1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> index 138c989..d3ef5f8 100644
>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = {
>>>>>>>> Â Â  static void ttm_bo_default_destroy(struct ttm_buffer_object *bo)
>>>>>>>> Â Â  {
>>>>>>>> Â Â Â Â Â Â  kfree(bo);
>>>>>>>> +Â Â Â  bo = NULL;
>>>>>>>> Â Â  }
>>>>>>>> Â Â  static inline int ttm_mem_type_from_place(const struct 
>>>>>>>> ttm_place *place,
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>