On 05/25/2018 05:35 PM, Christian König wrote: > Am 25.05.2018 um 10:23 schrieb Zhang, Jerry (Junwei): >> On 05/25/2018 03:54 PM, Christian König wrote: >>> Am 25.05.2018 um 09:20 schrieb Zhang, Jerry (Junwei): >>>> On 05/25/2018 02:44 PM, Christian König wrote: >>>>> NAK, that probably just fixed the symptom but not the underlying problem. >>>>> >>>>> Somebody is accessing the page array when it should never be accessed. >>>> >>>> If prime import as GTT bo by default(now it's CPU bo), it would happens >>>> quickly when GTT sg bo creation rather than next cs validation. >>>> >>>> Since ttm_sg_tt_init() only allocates gtt->ttm.dma_address if sg bo is >>>> created, it would fail to access ttm->pages when ttm populate. >>> >>> And exactly that's the problem, and imported BO should never populate. >>> >>>> >>>> current error happens in ttm populate from cs validation, the sg bo is >>>> imported from exporter. >>>> >>>>> >>>>> How did you manage to trigger this? >>>> >>>> PRI_PRIME=1 with Unigine heaven. >>> >>> Going to give that a try, but the last time I check that worked as expected. >> >> FYI. >> PRI_PRIME=1 glxinfo will not trigger that, but the game does. > > Just tested and it works perfectly fine. > > Is that on the closed stack or the open stack? I used unified driver(latest 18.20 build) + drm-next kernel, installed as all open stack on A+A platform. (issue was found by 18.20 build, all open stack(dkms driver)) BTW, How did you get the UMD? apt-get or build by yourself? Jerry > > Christian. > >> >> Jerry >> >>> >>> Thanks, >>> Christian. >>> >>>> >>>> Regards, >>>> Jerry >>>> >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>> Am 25.05.2018 um 07:41 schrieb Junwei Zhang: >>>>>> [ 632.679861] BUG: unable to handle kernel NULL pointer dereference at >>>>>> (null) >>>>>> [ 632.679892] IP: drm_prime_sg_to_page_addr_arrays+0x52/0xb0 [drm] >>>>>> <snip> >>>>>> [ 632.680011] Call Trace: >>>>>> [ 632.680082] amdgpu_ttm_tt_populate+0x3e/0xa0 [amdgpu] >>>>>> [ 632.680092] ttm_tt_populate.part.7+0x22/0x60 [amdttm] >>>>>> [ 632.680098] amdttm_tt_bind+0x52/0x60 [amdttm] >>>>>> [ 632.680106] ttm_bo_handle_move_mem+0x54b/0x5c0 [amdttm] >>>>>> [ 632.680112] ? find_next_bit+0xb/0x10 >>>>>> [ 632.680119] amdttm_bo_validate+0x11d/0x130 [amdttm] >>>>>> [ 632.680176] amdgpu_cs_bo_validate+0x9d/0x150 [amdgpu] >>>>>> [ 632.680232] amdgpu_cs_validate+0x41/0x270 [amdgpu] >>>>>> [ 632.680288] amdgpu_cs_list_validate+0xc7/0x1a0 [amdgpu] >>>>>> [ 632.680343] amdgpu_cs_ioctl+0x1634/0x1c00 [amdgpu] >>>>>> [ 632.680401] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu] >>>>>> [ 632.680416] drm_ioctl_kernel+0x6b/0xb0 [drm] >>>>>> [ 632.680431] drm_ioctl+0x3e4/0x450 [drm] >>>>>> [ 632.680485] ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu] >>>>>> [ 632.680537] amdgpu_drm_ioctl+0x4c/0x80 [amdgpu] >>>>>> [ 632.680542] do_vfs_ioctl+0xa4/0x600 >>>>>> [ 632.680546] ? SyS_futex+0x7f/0x180 >>>>>> [ 632.680549] SyS_ioctl+0x79/0x90 >>>>>> [ 632.680554] entry_SYSCALL_64_fastpath+0x24/0xab >>>>>> >>>>>> Signed-off-by: Junwei Zhang <Jerry.Zhang at amd.com> >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>>> index 57d4da6..b293809 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>>> @@ -1212,7 +1212,7 @@ static struct ttm_tt *amdgpu_ttm_tt_create(struct >>>>>> ttm_buffer_object *bo, >>>>>> gtt->ttm.ttm.func = &amdgpu_backend_func; >>>>>> /* allocate space for the uninitialized page entries */ >>>>>> - if (ttm_sg_tt_init(>t->ttm, bo, page_flags)) { >>>>>> + if (ttm_dma_tt_init(>t->ttm, bo, page_flags)) { >>>>>> kfree(gtt); >>>>>> return NULL; >>>>>> } >>>>> >>> >