Hi Evan, that one is perfect if you ask me. Just reading up on the history of that patch, Alex what was your concern with that? Regarding printing this as error, that's a really good point as well. We should probably reduce it to a warning or even info severity. Regards, Christian. Am 20.03.2018 um 03:11 schrieb Quan, Evan: > Hi Christian, > > The messages prompted on timeout are Errors not just Warnings although we did not see any real problem(for the dgemm special case). That's why we say it confusing. > And i suppose you want a fix like my previous patch(see attachment). > > Regards, > Evan >> -----Original Message----- >> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] >> Sent: Monday, March 19, 2018 5:42 PM >> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org >> Cc: Deucher, Alexander <Alexander.Deucher at amd.com> >> Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset >> disabled >> >> Am 19.03.2018 um 07:08 schrieb Evan Quan: >>> Since under some heavy computing environment(dgemm test), it takes the >>> asic over 10+ seconds to finish the dispatched single job which will >>> trigger the timeout. It's quite confusing although it does not seem to >>> bring any real problems. >>> As a quick workround, we choose to disable timeout when GPU reset is >>> disabled. >> NAK, I enabled those warning intentionally even when the GPU recovery is >> disabled to have a hint in the logs what goes wrong. >> >> Please only increase the timeout for the compute queue and/or add a >> separate timeout for them. >> >> Regards, >> Christian. >> >> >>> Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2 >>> Signed-off-by: Evan Quan <evan.quan at amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 8bd9c3f..9d6a775 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -861,6 +861,13 @@ static void >> amdgpu_device_check_arguments(struct amdgpu_device *adev) >>> amdgpu_lockup_timeout = 10000; >>> } >>> >>> + /* >>> + * Disable timeout when GPU reset is disabled to avoid confusing >>> + * timeout messages in the kernel log. >>> + */ >>> + if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1) >>> + amdgpu_lockup_timeout = INT_MAX; >>> + >>> adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, >> amdgpu_fw_load_type); >>> } >>>