[PATCH] [PATCH] drm/amdgpu/sriov: Check pending job finished or not to identify has bad job

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



drm_sched_free_job_work is a queue work function,
so even job is finished in hw, it still needs some time to
be deleted from the pending queue by drm_sched_free_job_work.
here iterates over the pending job list and wait for each job to finish
within specified timeout (1s by default) to avoid jobs that are not
cleaned up in time or are about to finished.
if wait timeout, return true

Signed-off-by: Tong Liu01 <Tong.Liu01@xxxxxxx>
Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx>
Signed-off-by: Shikang Fan <shikang.fan@xxxxxxx>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6c0ff1c2ae4c..83ce1c85e680 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -100,6 +100,7 @@ MODULE_FIRMWARE("amdgpu/navi12_gpu_info.bin");
 #define AMDGPU_PCIE_INDEX_FALLBACK (0x38 >> 2)
 #define AMDGPU_PCIE_INDEX_HI_FALLBACK (0x44 >> 2)
 #define AMDGPU_PCIE_DATA_FALLBACK (0x3C >> 2)
+#define AMDGPU_PENDING_JOB_TIMEOUT	msecs_to_jiffies(1000)
 
 static const struct drm_driver amdgpu_kms_driver;
 
@@ -5198,7 +5199,8 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
 bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
 {
 	int i;
-	struct drm_sched_job *job;
+	struct drm_sched_job *job, *tmp;
+	long r;
 
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		struct amdgpu_ring *ring = adev->rings[i];
@@ -5207,11 +5209,20 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
 			continue;
 
 		spin_lock(&ring->sched.job_list_lock);
-		job = list_first_entry_or_null(&ring->sched.pending_list,
-					       struct drm_sched_job, list);
+
+		/* iterates over the pending job list
+		 * wait for each job to finish within timeout (1s by default)
+		 * if wait timeout, return true
+		 */
+		list_for_each_entry_safe(job, tmp, &ring->sched.pending_list, list) {
+			r = dma_fence_wait_timeout(&job->s_fence->finished,
+								false, AMDGPU_PENDING_JOB_TIMEOUT);
+			if (r <= 0) {
+				spin_unlock(&ring->sched.job_list_lock);
+				return true;
+			}
+		}
 		spin_unlock(&ring->sched.job_list_lock);
-		if (job)
-			return true;
 	}
 	return false;
 }
-- 
2.34.1




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux