Patch "drm/amdgpu: add lock in kfd_process_dequeue_from_device" has been added to the 6.10-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Wed, 4 Sep 2024 13:48:31 -0400

This is a note to let you know that I've just added the patch titled

    drm/amdgpu: add lock in kfd_process_dequeue_from_device

to the 6.10-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-amdgpu-add-lock-in-kfd_process_dequeue_from_devi.patch
and it can be found in the queue-6.10 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit fc119049d26c16969653237c633a7e219556c1dc
Author: Yunxiang Li <Yunxiang.Li@xxxxxxx>
Date:   Mon Jun 3 12:29:30 2024 -0400

    drm/amdgpu: add lock in kfd_process_dequeue_from_device
    
    [ Upstream commit d225960c2330e102370815367b877baaf8bb8b5d ]
    
    We need to take the reset domain lock before talking to MES. While in
    this case we can take the lock inside the mes helper. We can't do so for
    most other mes helpers since they are used during reset. So for
    consistency sake we add the lock here.
    
    Signed-off-by: Yunxiang Li <Yunxiang.Li@xxxxxxx>
    Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx>
    Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 4858112f9a53..a5bdc3258ae5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -28,6 +28,7 @@
 #include "kfd_priv.h"
 #include "kfd_kernel_queue.h"
 #include "amdgpu_amdkfd.h"
+#include "amdgpu_reset.h"
 
 static inline struct process_queue_node *get_queue_by_qid(
 			struct process_queue_manager *pqm, unsigned int qid)
@@ -87,8 +88,12 @@ void kfd_process_dequeue_from_device(struct kfd_process_device *pdd)
 		return;
 
 	dev->dqm->ops.process_termination(dev->dqm, &pdd->qpd);
-	if (dev->kfd->shared_resources.enable_mes)
-		amdgpu_mes_flush_shader_debugger(dev->adev, pdd->proc_ctx_gpu_addr);
+	if (dev->kfd->shared_resources.enable_mes &&
+	    down_read_trylock(&dev->adev->reset_domain->sem)) {
+		amdgpu_mes_flush_shader_debugger(dev->adev,
+						 pdd->proc_ctx_gpu_addr);
+		up_read(&dev->adev->reset_domain->sem);
+	}
 	pdd->already_dequeued = true;
 }