Am 02.04.19 um 11:23 schrieb Lou, Wentao: > Thanks Christian. > You mean msecs_to_jiffies(8000) should be time to complete whole loop, not each loop. Yeah, a normal desktop system can easily have more than 10000 BOs in that list. So the total timeout could be more than a day, which is a bit long :) > Just sent out another patch for review. Going to take a look. Regards, Christian. > Thanks. > > BR, > Wentao > > > -----Original Message----- > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > Sent: Tuesday, April 2, 2019 3:39 PM > To: Lou, Wentao <Wentao.Lou@xxxxxxx>; Deng, Emily <Emily.Deng@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always failed if only one node in shadow_list > > Yeah, exactly that's what should happen here. > > The value of tmo SHOULD be changed, otherwise we wait tmo jiffies on each loop. > > Christian. > > Am 02.04.19 um 09:29 schrieb Lou, Wentao: >> Hi Christian, >> >> If " tmo = dma_fence_wait_timeout(fence, false, tmo); " was executed inside list_for_each_entry, the value of tmo might be changed. >> But " tmo = dma_fence_wait_timeout(fence, false, tmo); " might be called after exiting the loop of list_for_each_entry. >> It might pass a different value of tmo into dma_fence_wait_timeout. >> >> BR, >> Wentao >> >> >> -----Original Message----- >> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> >> Sent: Tuesday, April 2, 2019 3:01 PM >> To: Deng, Emily <Emily.Deng@xxxxxxx>; Lou, Wentao >> <Wentao.Lou@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Subject: Re: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always >> failed if only one node in shadow_list >> >> Yeah, agree that is much closer to a correct solution. >> >> But even better would be to correctly update the timeout as well, e.g: >> >> tmo = dma_fence_wait_timeout(fence, false, tmo); dma_fence_put(fence); fence = next; if (tmo == 0) >> r = -ETIMEDOUT; >> break >> } else if (tmo < 0) { >> r = tmo; >> break; >> } >> >> That we restart the timeout for each wait looks like a rather problematic bug to me as well. >> >> Christian. >> >> Am 02.04.19 um 06:03 schrieb Deng, Emily: >>> Maybe it will be better to add follow check, and change “if (r <= 0 || tmo <= 0) " to "if (r <0 || tmo <= 0)". >>> r = dma_fence_wait_timeout(f, false, timeout); >>> if (r == 0) { >>> r = -ETIMEDOUT; >>> break; >>> } else if (r < 0) { >>> break; >>> } >>> >>> Best wishes >>> Emily Deng >>> >>> >>>> -----Original Message----- >>>> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of >>>> wentalou >>>> Sent: Monday, April 1, 2019 4:59 PM >>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> Cc: Lou, Wentao <Wentao.Lou@xxxxxxx> >>>> Subject: [PATCH] drm/amdgpu: amdgpu_device_recover_vram always >>>> failed if only one node in shadow_list >>>> >>>> amdgpu_bo_restore_shadow would assign zero to r if succeeded. >>>> r would remain zero if there is only one node in shadow_list. >>>> current code would always return failure when r <= 0. >>>> >>>> Change-Id: Iae6880e7c78b71fde6a6754c69665c2e312a80a5 >>>> Signed-off-by: Wentao Lou <Wentao.Lou@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- >>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> index c4c61e9..5cf21a4 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> @@ -3171,6 +3171,7 @@ static int amdgpu_device_recover_vram(struct >>>> amdgpu_device *adev) >>>> struct dma_fence *fence = NULL, *next = NULL; >>>> struct amdgpu_bo *shadow; >>>> long r = 1, tmo; >>>> + bool single_shadow = false; >>>> >>>> if (amdgpu_sriov_runtime(adev)) >>>> tmo = msecs_to_jiffies(8000); >>>> @@ -3194,10 +3195,12 @@ static int amdgpu_device_recover_vram(struct >>>> amdgpu_device *adev) >>>> r = dma_fence_wait_timeout(fence, false, tmo); >>>> dma_fence_put(fence); >>>> fence = next; >>>> + single_shadow = false; >>>> if (r <= 0) >>>> break; >>>> } else { >>>> fence = next; >>>> + single_shadow = true; >>>> } >>>> } >>>> mutex_unlock(&adev->shadow_list_lock); >>>> @@ -3206,7 +3209,8 @@ static int amdgpu_device_recover_vram(struct >>>> amdgpu_device *adev) >>>> tmo = dma_fence_wait_timeout(fence, false, tmo); >>>> dma_fence_put(fence); >>>> >>>> - if (r <= 0 || tmo <= 0) { >>>> + /* r would be zero even if amdgpu_bo_restore_shadow succeeded when >>>> single shadow in list */ >>>> + if (r < 0 || (r == 0 && !single_shadow) || tmo <= 0) { >>>> DRM_ERROR("recover vram bo from shadow failed\n"); >>>> return -EIO; >>>> } >>>> -- >>>> 2.7.4 >>>> >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx