[PATCH] drm/amdgpu: disable job timeout on GPU reset disabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Evan,

that one is perfect if you ask me. Just reading up on the history of 
that patch, Alex what was your concern with that?

Regarding printing this as error, that's a really good point as well. We 
should probably reduce it to a warning or even info severity.

Regards,
Christian.

Am 20.03.2018 um 03:11 schrieb Quan, Evan:
> Hi Christian,
>
> The messages prompted on timeout are Errors not just Warnings although we did not see any real problem(for the dgemm special case). That's why we say it confusing.
> And i suppose you want a fix like my previous patch(see attachment).
>
> Regards,
> Evan
>> -----Original Message-----
>> From: Christian König [mailto:ckoenig.leichtzumerken at gmail.com]
>> Sent: Monday, March 19, 2018 5:42 PM
>> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset
>> disabled
>>
>> Am 19.03.2018 um 07:08 schrieb Evan Quan:
>>> Since under some heavy computing environment(dgemm test), it takes the
>>> asic over 10+ seconds to finish the dispatched single job which will
>>> trigger the timeout. It's quite confusing although it does not seem to
>>> bring any real problems.
>>> As a quick workround, we choose to disable timeout when GPU reset is
>>> disabled.
>> NAK, I enabled those warning intentionally even when the GPU recovery is
>> disabled to have a hint in the logs what goes wrong.
>>
>> Please only increase the timeout for the compute queue and/or add a
>> separate timeout for them.
>>
>> Regards,
>> Christian.
>>
>>
>>> Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
>>> Signed-off-by: Evan Quan <evan.quan at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index 8bd9c3f..9d6a775 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -861,6 +861,13 @@ static void
>> amdgpu_device_check_arguments(struct amdgpu_device *adev)
>>>    		amdgpu_lockup_timeout = 10000;
>>>    	}
>>>
>>> +	/*
>>> +	 * Disable timeout when GPU reset is disabled to avoid confusing
>>> +	 * timeout messages in the kernel log.
>>> +	 */
>>> +	if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
>>> +		amdgpu_lockup_timeout = INT_MAX;
>>> +
>>>    	adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
>> amdgpu_fw_load_type);
>>>    }
>>>



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux