Re: [PATCH] drm/amdgpu: change the fence ring wait timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Roy,


Am 13.01.21 um 07:36 schrieb Roy Sun:
This fix bug where when the engine hang, the fence ring will wait without quit and cause kernel crash

Please wrap commit message body after 75 characters. (checkpatch.pl should have shown you that.)

Some typos:

1.  fix*es*
2.  hang*s*
3.  Please add a dot/period at the end of sentences.

If there is a Linux kernel crash, please add the trace.

Please elaborate also. Where and why does the Linux kernel crash, and how does your patch fix this.

How can the bug be reproduced? On what hardware did you see it?

Signed-off-by: Roy Sun <Roy.Sun@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 48 ++++++++++++++++++++---
  1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 6b0aeee61b8b..738ea65077ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -41,6 +41,8 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
+#define AMDGPU_FENCE_TIMEOUT msecs_to_jiffies(1000)
+#define AMDGPU_FENCE_GFX_XGMI_TIMEOUT msecs_to_jiffies(2000)

Where did you get the values from, or why did you choose them.

  /*
   * Fences
   * Fences mark an event in the GPUs pipeline and are used
@@ -104,6 +106,38 @@ static void amdgpu_fence_write(struct amdgpu_ring *ring, u32 seq)
  		*drv->cpu_addr = cpu_to_le32(seq);
  }
+/**
+ * amdgpu_fence_wait_timeout - get the fence wait timeout
+ *
+ * @ring: ring the fence is associated with
+ *
+ * Returns the value of the fence wait timeout.
+ */
+long amdgpu_fence_wait_timeout(struct amdgpu_ring *ring)

See below, but it should be `signed long`.

+{
+	long tmo_gfx, tmo_mm, tmo;
+	struct amdgpu_device *adev = ring->adev;
+	tmo_mm = tmo_gfx = AMDGPU_FENCE_TIMEOUT;
+	if (amdgpu_sriov_vf(adev)) {
+		tmo_mm = 8 * AMDGPU_FENCE_TIMEOUT;
+	}

See coding style. No {} needed for one line statements.

+	if (amdgpu_sriov_runtime(adev)) {
+		tmo_gfx = 8 * AMDGPU_FENCE_TIMEOUT;

So why not define a special SRIOV macro too. But add a comment, why it should be eight times that.

+	} else if (adev->gmc.xgmi.hive_id) {
+		tmo_gfx = AMDGPU_FENCE_GFX_XGMI_TIMEOUT;
+	}

Ditto.

+	if (ring->funcs->type == AMDGPU_RING_TYPE_UVD ||
+		ring->funcs->type == AMDGPU_RING_TYPE_VCE ||
+		ring->funcs->type == AMDGPU_RING_TYPE_UVD_ENC ||
+		ring->funcs->type == AMDGPU_RING_TYPE_VCN_DEC ||
+		ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC ||
+		ring->funcs->type == AMDGPU_RING_TYPE_VCN_JPEG)

Align on `ring`?

+		tmo = tmo_mm;
+	else
+		tmo = tmo_gfx;
+	return tmo;

The return could happen in the branches.

+}
+
  /**
   * amdgpu_fence_read - read a fence value
   *
@@ -166,10 +200,12 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **f,
  		rcu_read_unlock();
if (old) {
-			r = dma_fence_wait(old, false);
+			long timeout;
+			timeout = amdgpu_fence_wait_timeout(ring);
+			r = dma_fence_wait_timeout(old, false, timeout);

The signature is `signed long` as far as I see.

  			dma_fence_put(old);
  			if (r)
-				return r;
+				return r < 0 ? r : 0;
  		}
  	}
@@ -343,10 +379,12 @@ int amdgpu_fence_wait_empty(struct amdgpu_ring *ring)
  		return 0;
  	}
  	rcu_read_unlock();
-
-	r = dma_fence_wait(fence, false);
+	
+	long timeout;
+	timeout = amdgpu_fence_wait_timeout(ring);
+	r = dma_fence_wait_timeout(fence, false, timeout);
  	dma_fence_put(fence);
-	return r;
+	return r < 0 ? r : 0;
  }
/**


It looks like this is your first patch submission, so welcome to the Linux kernel community. Please keep in mind, that reviewer time is precious and scarce, and that formal and simple mistakes should not happen. No idea, if AMD has some kind of “onboarding” process for their newly employed Linux kernel contributors.

I am looking forward to your the patch iteration and your next contributions.


Kind regards,

Paul
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux