[PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

The messages prompted on timeout are Errors not just Warnings although we did not see any real problem(for the dgemm special case). That's why we say it confusing.
And i suppose you want a fix like my previous patch(see attachment).

Regards,
Evan
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com]
> Sent: Monday, March 19, 2018 5:42 PM
> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset
> disabled
> 
> Am 19.03.2018 um 07:08 schrieb Evan Quan:
> > Since under some heavy computing environment(dgemm test), it takes the
> > asic over 10+ seconds to finish the dispatched single job which will
> > trigger the timeout. It's quite confusing although it does not seem to
> > bring any real problems.
> > As a quick workround, we choose to disable timeout when GPU reset is
> > disabled.
> 
> NAK, I enabled those warning intentionally even when the GPU recovery is
> disabled to have a hint in the logs what goes wrong.
> 
> Please only increase the timeout for the compute queue and/or add a
> separate timeout for them.
> 
> Regards,
> Christian.
> 
> 
> >
> > Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
> > Signed-off-by: Evan Quan <evan.quan at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 8bd9c3f..9d6a775 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -861,6 +861,13 @@ static void
> amdgpu_device_check_arguments(struct amdgpu_device *adev)
> >   		amdgpu_lockup_timeout = 10000;
> >   	}
> >
> > +	/*
> > +	 * Disable timeout when GPU reset is disabled to avoid confusing
> > +	 * timeout messages in the kernel log.
> > +	 */
> > +	if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
> > +		amdgpu_lockup_timeout = INT_MAX;
> > +
> >   	adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
> amdgpu_fw_load_type);
> >   }
> >

-------------- next part --------------
An embedded message was scrubbed...
From: "Quan, Evan" <Evan.Quan@xxxxxxx>
Subject: [PATCH] drm/amdgpu: no job timeout setting on compute queues
Date: Fri, 16 Mar 2018 04:52:32 +0000
Size: 4757
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180320/cb42d918/attachment.mht>


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux