Hi Christian, If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed. But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry. It might pass a different value of tmo into dma_fence_wait_timeout. BR, Wentao -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: Tuesday, April 2, 2019 3:01 PM To: Deng, Emily <Emily.Deng@xxxxxxx>; Lou, Wentao <Wentao.Lou@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list Yeah, agree that is much closer to a correct solution. But even better would be to correctly update the timeout as well, e.g: tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0) r = -ETIMEDOUT; break } else if (tmo < 0) { r = tmo; break; } That we restart the timeout for each wait looks like a rather problematic bug to me as well. Christian. Am 02.04.19 um 06:03 schrieb Deng, Emily: > Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". > r = dma_fence_wait_timeout(f, false, timeout); > if (r == 0) { > r = -ETIMEDOUT; > break; > } else if (r < 0) { > break; > } > > Best wishes > Emily Deng > > >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of >> wentalou >> Sent: Monday, April 1, 2019 4:59 PM >> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Cc: Lou, Wentao <Wentao.Lou@xxxxxxx> >> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed >> if only one node in shadow_list >> >> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >> r would remain zero if there is only one node in shadow_list. >> current code would always return failure when r <= 0. >> >> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >> Signed-off-by: Wentao Lou <Wentao.Lou@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index c4c61e9..5cf21a4 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> struct dma_fence *fence = NULL, *next = NULL; >> struct amdgpu_bo *shadow; >> long r = 1, tmo; >> + bool single_shadow = false; >> >> if (amdgpu_sriov_runtime(adev)) >> tmo = msecs_to_jiffies(8000); >> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> r = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> fence = next; >> + single_shadow = false; >> if (r <= 0) >> break; >> } else { >> fence = next; >> + single_shadow = true; >> } >> } >> mutex_unlock(&adev->shadow_list_lock); >> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >> amdgpu_device *adev) >> tmo = dma_fence_wait_timeout(fence, false, tmo); >> dma_fence_put(fence); >> >> - if (r <= 0 || tmo <= 0) { >> + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >> single shadow in list */ >> + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { >> DRM_ERROR("recover vram bo from shadow failed\n"); >> return -EIO; >> } >> -- >> 2.7.4 >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx