Hi Christian, The messages prompted on timeout are Errors not just Warnings although we did not see any real problem(for the dgemm special case). That's why we say it confusing. And i suppose you want a fix like my previous patch(see attachment). Regards, Evan > -----Original Message----- > From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com] > Sent: Monday, March 19, 2018 5:42 PM > To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org > Cc: Deucher, Alexander <Alexander.Deucher at amd.com> > Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset > disabled > > Am 19.03.2018 um 07:08 schrieb Evan Quan: > > Since under some heavy computing environment(dgemm test), it takes the > > asic over 10+ seconds to finish the dispatched single job which will > > trigger the timeout. It's quite confusing although it does not seem to > > bring any real problems. > > As a quick workround, we choose to disable timeout when GPU reset is > > disabled. > > NAK, I enabled those warning intentionally even when the GPU recovery is > disabled to have a hint in the logs what goes wrong. > > Please only increase the timeout for the compute queue and/or add a > separate timeout for them. > > Regards, > Christian. > > > > > > Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2 > > Signed-off-by: Evan Quan <evan.quan at amd.com> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index 8bd9c3f..9d6a775 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -861,6 +861,13 @@ static void > amdgpu_device_check_arguments(struct amdgpu_device *adev) > > amdgpu_lockup_timeout = 10000; > > } > > > > + /* > > + * Disable timeout when GPU reset is disabled to avoid confusing > > + * timeout messages in the kernel log. > > + */ > > + if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1) > > + amdgpu_lockup_timeout = INT_MAX; > > + > > adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, > amdgpu_fw_load_type); > > } > > -------------- next part -------------- An embedded message was scrubbed... From: "Quan, Evan" <Evan.Quan@xxxxxxx> Subject: [PATCH] drm/amdgpu: no job timeout setting on compute queues Date: Fri, 16 Mar 2018 04:52:32 +0000 Size: 4757 URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180320/cb42d918/attachment.mht>