Hi Paul, Answers as below: -----Original Message----- From: Paul Menzel <pmenzel+amd-gfx@xxxxxxxxxxxxx> Sent: Wednesday, August 8, 2018 10:29 PM To: Deng, Emily <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org Cc: Liu, Monk <Monk.Liu at amd.com> Subject: Re: [PATCH] drm/amdgpu/sriov: give 8s for recover vram under RUNTIME Dear Deng, On 08/08/18 04:13, Emily Deng wrote: > Modify the commit message I guess the line above is a leftover from some template, and can be removed? [Emily] It is removed > Extend the timeout for recovering vram bos from shadows on sr-iov > to cover the worst case scenario for timeslices and VFs > > Under runtime, the wait fence time could be quite long when > other VFs are in exclusive mode. For example, for 4 VF, every > VF's exclusive timeout time is set to 3s, then the worst case is > 9s. If the VF number is more than 4,then the worst case time will > be longer. Nit: Missing space after the comma. How did you get to nine seconds? Isnâ??t it four times three, which is twelve? [Emily] It is 3 times three, not 4, as the VF only need to wait the other three VF's exclusive timeout. > The 8s is the test data, with setting to 8s, it will pass the TDR > test for 1000 times. > > SWDEV-161490 > > Change-Id: Ifc32d56ca7fde01b1f4fe2b0db6959b51909008a > Signed-off-by: Monk Liu <Monk.Liu at amd.com> > Signed-off-by: Emily Deng <Emily.Deng at amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 1d933db..ef82ad1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3124,7 +3124,7 @@ static int amdgpu_device_handle_vram_lost(struct amdgpu_device *adev) > long tmo; > > if (amdgpu_sriov_runtime(adev)) > - tmo = msecs_to_jiffies(amdgpu_lockup_timeout); > + tmo = msecs_to_jiffies(8000); Actually, I do not understand the change at all. Isnâ??t that a module parameter? ``` $ git grep amdgpu_lockup_timeout drivers/gpu/drm/amd/amdgpu/amdgpu.h:extern int amdgpu_lockup_timeout; drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: if (amdgpu_lockup_timeout == 0) { drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: amdgpu_lockup_timeout = 10000; drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: tmo = msecs_to_jiffies(amdgpu_lockup_timeout); drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:int amdgpu_lockup_timeout = 10000; drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:module_param_named(lockup_timeout, amdgpu_lockup_timeout, int, 0444); drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c: timeout = msecs_to_jiffies(amdgpu_lockup_timeout); drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c: if (amdgpu_lockup_timeout == 0) ``` So, if itâ??s not set, the time-out is set to 10(!) seconds anyway, isnâ??t it? What am I missing? > else > tmo = msecs_to_jiffies(100); > > In my opinion, such time-outs need to have a big FIXME added to it, and VFâ??s exclusive time-outs should be drastically decreased. Kind regards, Paul