I think this workaround was reverted later after a firmware fix.
Regards,
Felix
Am 2023-03-30 um 15:42 schrieb Alex Deucher:
From: Philip Yang <Philip.Yang@xxxxxxx>
MEC FW should flush TLB and cache when unmapping user queues, this
is not working correctly in master FW via HIQ, it affects SDMA queues
which use mmhub on AID, cause several KFDTest failure.
Workaround this in KFD for now. Will revert this patch to verify FW fix
later.
Signed-off-by: Philip Yang <Philip.Yang@xxxxxxx>
Tested-by: David Francis <David.Francis@xxxxxxx>
Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx>
Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index ab91a0e211c8..1d53cbc55253 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1038,6 +1038,15 @@ static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES :
KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
+ /* Workaround MEC mmhub flush issue
+ * explicit heavyweight TLB flush after all unmap_queues calls
+ *
+ * It would not help if the firmware is unmapping queues itself when the
+ * runlist is oversubscribed.
+ */
+ atomic64_set(&pdd->tlb_seq, 0);
+ kfd_flush_tlb(pdd, TLB_FLUSH_HEAVYWEIGHT);
+
out:
dqm_unlock(dqm);
return retval;