Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Oak,

[Oak] I am not familiar about the power gating sequence but from first glance, should the power gating sequence make sure that HW is ready (idle) for power gating before put the system to power gating?
The problem is that the hardware is actually idle when gated.

See what happens is the following:

1. Ring A sends an invalidate command to VM invalidation engine X.

2. VM invalidation engine X wakes up and is ungated because it now has work.

3. VM invalidation engine X finishes the invalidation and goes back to be gated again.

4. Now ring A polls for the invalidation on engine X to complete, but since it got back to be gated again it has forgotten that we have finished that invalidation. BAM! Ring A will poll forever.

Regards,
Christian.

Am 20.11.19 um 16:04 schrieb Zeng, Oak:
See an inline comment

Regards,
Oak

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Christian König
Sent: Wednesday, November 20, 2019 8:21 AM
To: Liu, Monk <Monk.Liu@xxxxxxx>; Zhu, Changfeng <Changfeng.Zhu@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Huang, Shimmer <Xinmei.Huang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation is in progress
[Oak] I am not familiar about the power gating sequence but from first glance, should the power gating sequence make sure that HW is ready (idle) for power gating before put the system to power gating? E.g., before we put the system to power gating, should we enquiry each HW blocks to see whether the HW is idle? If not (like the case you mentioned some invalidation activities is still ongoing) the power gating condition is not mature and we should we wait. Or if the power gating is trigger/initiated by HW (I am not sure), HW should guarantee it is idle?

  which looses all bits in the ACK register and so deadlocks the engine waiting for the invalidation to finish.

This bug is hit immediately when we enable power gating of the MMHUB.

Regards,
Christian.

Am 20.11.19 um 14:18 schrieb Liu, Monk:
Hi Changfeng

Firs of all, there is no power-gating off circle involved in AMDGPU
SRIOV, since we don't allow VF/VM do such things so I do feel strange
why you post something like this Especially on VEGA10 serials which
looks doesn't have any issue on those gpu_flush part

Here is my questions for you:
1) Can you point me what issue had you been experienced ? and how to
repro the bug
2) if you do hit some issues, did you verified that your patch can fix it ?

besides

/Monk

-----邮件原件-----
发件人: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> 代表 Changfeng.Zhu
发送时间: 2019年11月20日 17:14
收件人: Koenig, Christian <Christian.Koenig@xxxxxxx>; Xiao, Jack
<Jack.Xiao@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Huang, Ray
<Ray.Huang@xxxxxxx>; Huang, Shimmer <Xinmei.Huang@xxxxxxx>;
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
抄送: Zhu, Changfeng <Changfeng.Zhu@xxxxxxx>
主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in
amdgpu_virt

From: changzhu <Changfeng.Zhu@xxxxxxx>

It may lose gpuvm invalidate acknowldege state across power-gating off cycle. To avoid this issue in virt invalidation, add semaphore acquire before invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu <Changfeng.Zhu@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++++++++++++++++++++++--  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c    |  3 ++-
   3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
*adev, uint32_t reg, uint32_t v)
void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
   					uint32_t reg0, uint32_t reg1,
-					uint32_t ref, uint32_t mask)
+					uint32_t ref, uint32_t mask,
+					uint32_t sem)
   {
   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
   	struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
   	uint32_t seq;
spin_lock_irqsave(&kiq->ring_lock, flags);
-	amdgpu_ring_alloc(ring, 32);
+	amdgpu_ring_alloc(ring, 60);
+
+	/*
+	 * It may lose gpuvm invalidate acknowldege state across power-gating
+	 * off cycle, add semaphore acquire before invalidation and semaphore
+	 * release after invalidation to avoid entering power gated state
+	 * to WA the Issue
+	 */
+
+	/* a read return value of 1 means semaphore acuqire */
+	if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+	    ring->funcs->vmhub == AMDGPU_MMHUB_1)
+	amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
   	amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
   					    ref, mask);
+	/*
+	 * add semaphore release after invalidation,
+	 * write with 0 means semaphore release
+	 */
+	if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+	    ring->funcs->vmhub == AMDGPU_MMHUB_1)
+	amdgpu_ring_emit_wreg(ring, sem, 0);
+
   	amdgpu_fence_emit_polling(ring, &seq);
   	amdgpu_ring_commit(ring);
   	spin_unlock_irqrestore(&kiq->ring_lock, flags); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t reg);  void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v);  void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
   					uint32_t reg0, uint32_t rreg1,
-					uint32_t ref, uint32_t mask);
+					uint32_t ref, uint32_t mask,
+					uint32_t sem);
   int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool
init);  int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev,
bool init);  int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f25cd97ba5f2..1ae59af7836a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
   			!adev->in_gpu_reset) {
   		uint32_t req = hub->vm_inv_eng0_req + eng;
   		uint32_t ack = hub->vm_inv_eng0_ack + eng;
+		uint32_t sem = hub->vm_inv_eng0_sem + eng;
amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,
-				1 << vmid);
+						   1 << vmid, sem);
   		return;
   	}
--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux