On 2024-10-18 01:31, Zhao, Victor
wrote:
Reviewed-by: Philip Yang <Philip.Yang@xxxxxxx>[AMD Official Use Only - AMD Internal Distribution Only] [AMD Official Use Only - AMD Internal Distribution Only] Ping. Please help review. Thanks, Victor -----Original Message----- From: Victor Zhao <Victor.Zhao@xxxxxxx> Sent: Thursday, October 17, 2024 4:35 PM To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Zhao, Victor <Victor.Zhao@xxxxxxx> Subject: [PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status called, to avoid qcm fence timeout caused by incorrect ordering. Signed-off-by: Victor Zhao <Victor.Zhao@xxxxxxx>
--- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index b2b16a812e73..d9264a353775 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2254,6 +2254,7 @@ static int unmap_queues_cpsch(struct device_queue_manager *dqm, goto out; *dqm->fence_addr = KFD_FENCE_INIT; + mb(); pm_send_query_status(&dqm->packet_mgr, dqm->fence_gpu_addr, KFD_FENCE_COMPLETED); /* should be timed out */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 09ab36f8e8c6..bddb169bb301 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -260,7 +260,7 @@ struct device_queue_manager { uint16_t vmid_pasid[VMID_NUM]; uint64_t pipelines_addr; uint64_t fence_gpu_addr; - uint64_t *fence_addr; + volatile uint64_t *fence_addr; struct kfd_mem_obj *fence_mem; bool active_runlist; int sched_policy; -- 2.34.1