[AMD Official Use Only - General] For some Vulkan stress tests, it might be not possible to rewrite using ROCm. After a twice think, it might be too risky if we put 120s, because of the softlockup timeout set to 120s. To support some stress tests like the one which recently I saw on stressbench (Vulkan stress test), if we shorten the 120s to a reasonable range like 100s, it can also fix the software hang. -----Original Message----- From: Alex Deucher <alexdeucher@xxxxxxxxx> Sent: Thursday, April 20, 2023 8:57 PM To: Xu, Feifei <Feifei.Xu@xxxxxxx> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx> Subject: Re: [PATCH] drm/amdgpu: extend the default timeout for kernel compute queues On Thu, Apr 20, 2023 at 5:19 AM Feifei Xu <Feifei.Xu@xxxxxxx> wrote: > > Extend to 120s. The default timeout value should also extend if > compute shader execution time extended. Otherwise some stress test > will trigger compute ring timeout in software. I think that's probably too long. 2 minutes is a long time to have a hung system. I think we should rework the tests or use ROCm for long running test cases. Alex > > Signed-off-by: Feifei Xu <Feifei.Xu@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index e536886f6d42..1f98b4b0a549 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3475,7 +3475,7 @@ static int > amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) > > /* > * By default timeout for non compute jobs is 10000 > - * and 60000 for compute jobs. > + * and 120000 for compute jobs. > * In SR-IOV or passthrough mode, timeout for compute > * jobs are 60000 by default. > */ > @@ -3485,7 +3485,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) > adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ? > msecs_to_jiffies(60000) : msecs_to_jiffies(10000); > else > - adev->compute_timeout = msecs_to_jiffies(60000); > + adev->compute_timeout = msecs_to_jiffies(120000); > > if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) { > while ((timeout_setting = strsep(&input, ",")) && > -- > 2.34.1 >