Re: [PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024-10-17 04:34, Victor Zhao wrote:
make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ordering.

Signed-off-by: Victor Zhao <Victor.Zhao@xxxxxxx>
---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 +
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
  2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b2b16a812e73..d9264a353775 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2254,6 +2254,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm,
  		goto out;
*dqm->fence_addr = KFD_FENCE_INIT;
+	mb();
  	pm_send_query_status(&dqm->packet_mgr, dqm->fence_gpu_addr,
  				KFD_FENCE_COMPLETED);
  	/* should be timed out */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 09ab36f8e8c6..bddb169bb301 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -260,7 +260,7 @@ struct device_queue_manager {
  	uint16_t		vmid_pasid[VMID_NUM];
  	uint64_t		pipelines_addr;
  	uint64_t		fence_gpu_addr;
-	uint64_t		*fence_addr;
+	volatile uint64_t	*fence_addr;

[+Christian]

Is the volatile keyword really needed here? I just saw other patches removing volatile in some places because it's not sufficient, and not needed if you use memory barriers correctly.

Regards,
  Felix


  	struct kfd_mem_obj	*fence_mem;
  	bool			active_runlist;
  	int			sched_policy;



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux