Hi Tom, I'm talking about adding new printks to figure out what the heck is going wrong here. Thanks, Christian. Am 10.09.2018 um 14:59 schrieb Tom St Denis: > Hi Christian, > > Are you adding new traces or turning on existing ones? Would you like > me to try them out in my setup? > > Tom > > On 2018-09-10 8:49 a.m., Christian König wrote: >> Am 10.09.2018 um 14:05 schrieb Huang Rui: >>> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote: >>>> Am 10.09.2018 um 11:23 schrieb Huang Rui: >>>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote: >>>>>> Hi Ray, >>>>>> >>>>>> well those patches doesn't make sense, the pointer is only local to >>>>>> the function. >>>>> You're right. >>>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the >>>>> use-after-free should be in below codes: >>>>> >>>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT]; >>>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false); >>>>> >>>>> Is there a case, when orignal bo is destroyed in the bulk pos, but it >>>>> doesn't update pos->first pointer, then we still use it during the >>>>> bulk >>>>> moving? >>>> Only when a per VM BO is freed or the VM destroyed. >>>> >>>> The first case should now be handled by "drm/amdgpu: set bulk_moveable >>>> to false when a per VM is released" and when we use a destroyed VM we >>>> would see other problems as well. >>>> >>> If a VM instance is teared down, all BOs which belong that VM should be >>> removed from LRU. But how can we submit cmd based on a destroyed VM? >>> You >>> know, we do the bulk move at last step of submission. >> >> Well exactly that's the point this can't happen :) >> >> Otherwise we would crash because of using freed up memory much >> earlier in the command submission. >> >> The best idea I have to track this down further is to add some >> trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and see >> why and when we are actually using a destroyed BO. >> >> Christian. >> >>> >>> >>> Thanks, >>> Ray >>> >>>> BTW: Just pushed this commit to the repository, should show up any >>>> second. >>>> >>>> Christian. >>>> >>>>> Thanks, >>>>> Ray >>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui: >>>>>>> It avoids to be refered again after freed. >>>>>>> >>>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com> >>>>>>> Cc: Christian König <christian.koenig at amd.com> >>>>>>> Cc: Tom StDenis <Tom.StDenis at amd.com> >>>>>>> --- >>>>>>>   drivers/gpu/drm/ttm/ttm_bo.c | 1 + >>>>>>>   1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>> index 138c989..d3ef5f8 100644 >>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = { >>>>>>>   static void ttm_bo_default_destroy(struct ttm_buffer_object *bo) >>>>>>>   { >>>>>>>       kfree(bo); >>>>>>> +   bo = NULL; >>>>>>>   } >>>>>>>   static inline int ttm_mem_type_from_place(const struct >>>>>>> ttm_place *place, >>>>>> _______________________________________________ >>>>>> amd-gfx mailing list >>>>>> amd-gfx at lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx