[PATCH 6.0 140/190] drm/amdgpu: workaround for TLB seq race

Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> · Mon, 14 Nov 2022 13:46:04 +0100

From: Christian König <christian.koenig@xxxxxxx>

commit 77c092e054262b594614bad5e5f47e57c5d29639 upstream.

It can happen that we query the sequence value before the callback
had a chance to run.

Workaround that by grabbing the fence lock and releasing it again.
Should be replaced by hw handling soon.

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
CC: stable@xxxxxxxxxxxxxxx # 5.19+
Fixes: 5255e146c99a6 ("drm/amdgpu: rework TLB flushing")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2113
Acked-by: Alex Deucher <alexander.deucher@xxxxxxx>
Acked-by: Philip Yang <Philip.Yang@xxxxxxx>
Tested-by: Stefan Springer <stefanspr94@xxxxxxxxx>
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -485,6 +485,21 @@ void amdgpu_debugfs_vm_bo_info(struct am
  */
 static inline uint64_t amdgpu_vm_tlb_seq(struct amdgpu_vm *vm)
 {
+	unsigned long flags;
+	spinlock_t *lock;
+
+	/*
+	 * Workaround to stop racing between the fence signaling and handling
+	 * the cb. The lock is static after initially setting it up, just make
+	 * sure that the dma_fence structure isn't freed up.
+	 */
+	rcu_read_lock();
+	lock = vm->last_tlb_flush->lock;
+	rcu_read_unlock();
+
+	spin_lock_irqsave(lock, flags);
+	spin_unlock_irqrestore(lock, flags);
+
 	return atomic64_read(&vm->tlb_seq);
 }