Re: [PATCH] drm/amdgpu: extend the default timeout for kernel compute queues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 20.04.23 um 14:56 schrieb Alex Deucher:
On Thu, Apr 20, 2023 at 5:19 AM Feifei Xu <Feifei.Xu@xxxxxxx> wrote:
Extend to 120s. The default timeout value should also extend if compute
shader execution time extended. Otherwise some stress test will trigger
compute ring timeout in software.
I think that's probably too long.  2 minutes is a long time to have a
hung system.  I think we should rework the tests or use ROCm for long
running test cases.

Yeah, agree. This came up multiple times now and even 60000 is way to much actually.

You need to keep in mind that this has dependencies and it essentially means that the system sometimes needs 60sec in case of a bug to become responsible again.

So clearly a NAK for this.

Christian.


Alex

Signed-off-by: Feifei Xu <Feifei.Xu@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e536886f6d42..1f98b4b0a549 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3475,7 +3475,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)

         /*
          * By default timeout for non compute jobs is 10000
-        * and 60000 for compute jobs.
+        * and 120000 for compute jobs.
          * In SR-IOV or passthrough mode, timeout for compute
          * jobs are 60000 by default.
          */
@@ -3485,7 +3485,7 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
                 adev->compute_timeout = amdgpu_sriov_is_pp_one_vf(adev) ?
                                         msecs_to_jiffies(60000) : msecs_to_jiffies(10000);
         else
-               adev->compute_timeout =  msecs_to_jiffies(60000);
+               adev->compute_timeout =  msecs_to_jiffies(120000);

         if (strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) {
                 while ((timeout_setting = strsep(&input, ",")) &&
--
2.34.1





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux