On 2018-09-10 9:04 a.m., Christian König wrote: > Hi Tom, > > I'm talking about adding new printks to figure out what the heck is > going wrong here. > > Thanks, > Christian. Hi Christian, Sure, if you want to send me a simple patch that adds more printk I'll gladly give it a try (doubly so since my workstation depends on our staging tree to work properly...). Tom > > Am 10.09.2018 um 14:59 schrieb Tom St Denis: >> Hi Christian, >> >> Are you adding new traces or turning on existing ones? Would you like >> me to try them out in my setup? >> >> Tom >> >> On 2018-09-10 8:49 a.m., Christian König wrote: >>> Am 10.09.2018 um 14:05 schrieb Huang Rui: >>>> On Mon, Sep 10, 2018 at 05:25:48PM +0800, Koenig, Christian wrote: >>>>> Am 10.09.2018 um 11:23 schrieb Huang Rui: >>>>>> On Mon, Sep 10, 2018 at 11:00:04AM +0200, Christian König wrote: >>>>>>> Hi Ray, >>>>>>> >>>>>>> well those patches doesn't make sense, the pointer is only local to >>>>>>> the function. >>>>>> You're right. >>>>>> I narrowed it with gdb dump from ttm_bo_bulk_move_lru_tail+0x2b, the >>>>>> use-after-free should be in below codes: >>>>>> >>>>>> man = &bulk->tt[i].first->bdev->man[TTM_PL_TT]; >>>>>> ttm_bo_bulk_move_helper(&bulk->tt[i], &man->lru[i], false); >>>>>> >>>>>> Is there a case, when orignal bo is destroyed in the bulk pos, but it >>>>>> doesn't update pos->first pointer, then we still use it during the >>>>>> bulk >>>>>> moving? >>>>> Only when a per VM BO is freed or the VM destroyed. >>>>> >>>>> The first case should now be handled by "drm/amdgpu: set bulk_moveable >>>>> to false when a per VM is released" and when we use a destroyed VM we >>>>> would see other problems as well. >>>>> >>>> If a VM instance is teared down, all BOs which belong that VM should be >>>> removed from LRU. But how can we submit cmd based on a destroyed VM? >>>> You >>>> know, we do the bulk move at last step of submission. >>> >>> Well exactly that's the point this can't happen :) >>> >>> Otherwise we would crash because of using freed up memory much >>> earlier in the command submission. >>> >>> The best idea I have to track this down further is to add some >>> trace_printk in ttm_bo_bulk_move_helper and amdgpu_bo_destroy and see >>> why and when we are actually using a destroyed BO. >>> >>> Christian. >>> >>>> >>>> >>>> Thanks, >>>> Ray >>>> >>>>> BTW: Just pushed this commit to the repository, should show up any >>>>> second. >>>>> >>>>> Christian. >>>>> >>>>>> Thanks, >>>>>> Ray >>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>> Am 10.09.2018 um 10:57 schrieb Huang Rui: >>>>>>>> It avoids to be refered again after freed. >>>>>>>> >>>>>>>> Signed-off-by: Huang Rui <ray.huang at amd.com> >>>>>>>> Cc: Christian König <christian.koenig at amd.com> >>>>>>>> Cc: Tom StDenis <Tom.StDenis at amd.com> >>>>>>>> --- >>>>>>>>   drivers/gpu/drm/ttm/ttm_bo.c | 1 + >>>>>>>>   1 file changed, 1 insertion(+) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>> index 138c989..d3ef5f8 100644 >>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c >>>>>>>> @@ -54,6 +54,7 @@ static struct attribute ttm_bo_count = { >>>>>>>>   static void ttm_bo_default_destroy(struct ttm_buffer_object *bo) >>>>>>>>   { >>>>>>>>       kfree(bo); >>>>>>>> +   bo = NULL; >>>>>>>>   } >>>>>>>>   static inline int ttm_mem_type_from_place(const struct >>>>>>>> ttm_place *place, >>>>>>> _______________________________________________ >>>>>>> amd-gfx mailing list >>>>>>> amd-gfx at lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel at lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >