Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok in this case we should just drop this patch.

Any objections on the other semaphore patch? IIRC we added the amdgpu_ring_emit_reg_write_reg_wait() especially to make sure that an invalidation can't be interrupted by a world switch.

When we add manual semaphore acquire/release before and after the invalidation that could break this quite badly.

Regards,
Christian.

Am 20.11.19 um 14:30 schrieb Liu, Monk:
Question for Emily and Monk: Do we support power gating of the MMHUB with SRIOV? I don't think so and when that's correct we could just drop this patch.
Any power gating if now allowed to be controlled by a VF in a guest VM ....

It is hypervisor driver's (gim) responsibility to conduct when can our hardware entering a power circle (e.g. BACO reset), and we have software mechanism to make sure
Power gating off circle shall only happen when all engine is idle (or any of them was hang) state.

And even some engine isn't hang (e.g. KIQ is still doing things like read register or gpu_flush_tlb , etc...) if GIM decide to power off GPU (BACO reset) then that's okay for KIQ, since
After BACO all engines would be re-init anyway

Thanks

/Monk

-----邮件原件-----
发件人: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
发送时间: 2019年11月20日 19:24
收件人: Zhu, Changfeng <Changfeng.Zhu@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx>; Zhou1, Tao <Tao.Zhou1@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Huang, Shimmer <Xinmei.Huang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deng, Emily <Emily.Deng@xxxxxxx>; Liu, Monk <Monk.Liu@xxxxxxx>
主题: Re: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

Hi Changfeng,

[adding Monk and Emily as well].

I thought more about this and came to the conclusion that this won't work and might result in a lockup as well.

We are using the KIQ on SRIOV for GPUVM invalidation because we need an atomic read/modify/write cycle since we found that the invalidation engine is resetted with every world switch.

Now accessing the semaphore registers is not atomic any more and we could have a world switch in between grabbing the semaphore and sending the VM invalidation. That either won't work or could result in a lockup as well.

Question for Emily and Monk: Do we support power gating of the MMHUB with SRIOV? I don't think so and when that's correct we could just drop this patch.

Regards,
Christian.

Am 20.11.19 um 10:14 schrieb Changfeng.Zhu:
From: changzhu <Changfeng.Zhu@xxxxxxx>

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in virt invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu <Changfeng.Zhu@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++++++++++++++++++++++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c    |  3 ++-
   3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
*adev, uint32_t reg, uint32_t v)
void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
   					uint32_t reg0, uint32_t reg1,
-					uint32_t ref, uint32_t mask)
+					uint32_t ref, uint32_t mask,
+					uint32_t sem)
   {
   	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
   	struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ void
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
   	uint32_t seq;
spin_lock_irqsave(&kiq->ring_lock, flags);
-	amdgpu_ring_alloc(ring, 32);
+	amdgpu_ring_alloc(ring, 60);
+
+	/*
+	 * It may lose gpuvm invalidate acknowldege state across power-gating
+	 * off cycle, add semaphore acquire before invalidation and semaphore
+	 * release after invalidation to avoid entering power gated state
+	 * to WA the Issue
+	 */
+
+	/* a read return value of 1 means semaphore acuqire */
+	if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+	    ring->funcs->vmhub == AMDGPU_MMHUB_1)
+	amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
   	amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
   					    ref, mask);
+	/*
+	 * add semaphore release after invalidation,
+	 * write with 0 means semaphore release
+	 */
+	if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+	    ring->funcs->vmhub == AMDGPU_MMHUB_1)
+	amdgpu_ring_emit_wreg(ring, sem, 0);
+
   	amdgpu_fence_emit_polling(ring, &seq);
   	amdgpu_ring_commit(ring);
   	spin_unlock_irqrestore(&kiq->ring_lock, flags); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t reg);
   void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v);
   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
   					uint32_t reg0, uint32_t rreg1,
-					uint32_t ref, uint32_t mask);
+					uint32_t ref, uint32_t mask,
+					uint32_t sem);
   int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);
   int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);
   int amdgpu_virt_reset_gpu(struct amdgpu_device *adev); diff --git
a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f25cd97ba5f2..1ae59af7836a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
   			!adev->in_gpu_reset) {
   		uint32_t req = hub->vm_inv_eng0_req + eng;
   		uint32_t ack = hub->vm_inv_eng0_ack + eng;
+		uint32_t sem = hub->vm_inv_eng0_sem + eng;
amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,
-				1 << vmid);
+						   1 << vmid, sem);
   		return;
   	}
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux