Hi Christian, Sometimes shadow->parent would be NULL in my testbed, but not reproduce today... Just sent out another patch following your advice. Thanks. BR, Wentao -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: Tuesday, April 2, 2019 6:36 PM To: Lou, Wentao <Wentao.Lou@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] amdgpu_device_recover_vram always failed if only one node in shadow_list Am 02.04.19 um 11:19 schrieb wentalou: > amdgpu_bo_restore_shadow would assign zero to r if succeeded. > r would remain zero if there is only one node in shadow_list. > current code would always return failure when r <= 0. > restart the timeout for each wait was a rather problematic bug as well. > The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop. > meanwhile, fix Call Trace by NULL of shadow->parent. > > Change-Id: I7e836ec7ab6cd0f069aac24f88e454e906637541 > Signed-off-by: Wentao Lou <Wentao.Lou@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ++++++++++----- > 1 file changed, 10 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index c4c61e9..5a2dc44 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3183,7 +3183,7 @@ static int amdgpu_device_recover_vram(struct > amdgpu_device *adev) > > /* No need to recover an evicted BO */ > if (shadow->tbo.mem.mem_type != TTM_PL_TT || > - shadow->parent->tbo.mem.mem_type != TTM_PL_VRAM) > + shadow->parent == NULL || shadow->parent->tbo.mem.mem_type != > +TTM_PL_VRAM) That doesn't looks like a good idea to me. Did you actually run into this issue? > continue; > > r = amdgpu_bo_restore_shadow(shadow, &next); @@ -3191,11 +3191,16 > @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) > break; > > if (fence) { > - r = dma_fence_wait_timeout(fence, false, tmo); > + tmo = dma_fence_wait_timeout(fence, false, tmo); > dma_fence_put(fence); > fence = next; > - if (r <= 0) > + if (tmo == 0) { > + r = -ETIMEDOUT; > break; > + } else if (tmo < 0) { > + r = tmo; > + break; > + } > } else { > fence = next; > } > @@ -3206,8 +3211,8 @@ static int amdgpu_device_recover_vram(struct amdgpu_device *adev) > tmo = dma_fence_wait_timeout(fence, false, tmo); > dma_fence_put(fence); > > - if (r <= 0 || tmo <= 0) { > - DRM_ERROR("recover vram bo from shadow failed\n"); > + if (r < 0 || tmo <= 0) { > + DRM_ERROR("recover vram bo from shadow failed, tmo is %d\n", tmo); Maybe print both r and tmo in the message. Regards, Christian. > return -EIO; > } > _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx