Mhm, that sounds like another bug to me which we need to investigate. Probably a race during freeing of shadow allocations. Christian. Am 03.04.19 um 08:35 schrieb Lou, Wentao: > Hi Christian, > > Sometimes shadow->parent would be NULL in my testbed, but not reproduce today... > Just sent out another patch following your advice. > Thanks. > > BR, > Wentao > > > -----Original Message----- > From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> > Sent: Tuesday, April 2, 2019 6:36 PM > To: Lou, Wentao <Wentao.Lou@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH] amdgpu_device_recover_vram always failed if only one node in shadow_list > > Am 02.04.19 um 11:19 schrieb wentalou: >> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >> r would remain zero if there is only one node in shadow_list. >> current code would always return failure when r <= 0. >> restart the timeout for each wait was a rather problematic bug as well. >> The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop. >> meanwhile, fix Call Trace by NULL of shadow->parent. >> >> Change-Id: I7e836ec7ab6cd0f069aac24f88e454e906637541 >> Signed-off-by: Wentao Lou <Wentao.Lou@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ++++++++++----- >> 1 file changed, 10 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index c4c61e9..5a2dc44 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3183,7 +3183,7 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> >> /* No need to recover an evicted BO */ >> if (shadow->tbo.mem.mem_type != TTM_PL_TT || >> - shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM) >> + shadow->parent == NULL || shadow->parent->tbo.mem.mem_type != >> +TTM_PL_VRAM) > That doesn't looks like a good idea to me. Did you actually run into this issue? > >> continue; >> >> r = amdgpu_bo_restore_shadow(shadow, &next); @@ -3191,11 +3191,16 >> @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) >> break; >> >> if (fence) { >> - r = dma_fence_wait_timeout(fence, false, tmo); >> + tmo = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> fence = next; >> - if (r <= 0) >> + if (tmo == 0) { >> + r = -ETIMEDOUT; >> break; >> + } else if (tmo < 0) { >> + r = tmo; >> + break; >> + } >> } else { >> fence = next; >> } >> @@ -3206,8 +3211,8 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) >> tmo = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> >> - if (r <= 0 || tmo <= 0) { >> - DRM_ERROR("recover vram bo from shadow failed\n"); >> + if (r < 0 || tmo <= 0) { >> + DRM_ERROR("recover vram bo from shadow failed, tmo is %d\n", tmo); > Maybe print both r and tmo in the message. > > Regards, > Christian. > >> return -EIO; >> } >> _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx