Re: [PATCH] drm/amdkfd: Fix eviction fence handling

Tested-by: Gang BA <Gang.Ba@xxxxxxx>
Reviewed-by: Gang BA <Gang.Ba@xxxxxxx>

From: Kuehling, Felix <Felix.Kuehling@xxxxxxx>
Sent: Wednesday, April 17, 2024 11:14 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>
Cc: Ba, Gang <Gang.Ba@xxxxxxx>; Prosyak, Vitaly <Vitaly.Prosyak@xxxxxxx>
Subject: [PATCH] drm/amdkfd: Fix eviction fence handling
Handle case that dma_fence_get_rcu_safe returns NULL.

If restore work is already scheduled, only update its timer. The same
work item cannot be queued twice, so undo the extra queue eviction.

Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs")
Signed-off-by: Felix Kuehling <felix.kuehling@xxxxxxx>
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index b79986412cd8..aafdf064651f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1922,6 +1922,8 @@ static int signal_eviction_fence(struct kfd_process *p)
         ef = dma_fence_get_rcu_safe(&p->ef);
+       if (!ef)
+               return -EINVAL;
         ret = dma_fence_signal(ef);
@@ -1949,10 +1951,9 @@ static void evict_process_worker(struct work_struct *work)
                  * they are responsible stopping the queues and scheduling
                  * the restore work.
-               if (!signal_eviction_fence(p))
-                       queue_delayed_work(kfd_restore_wq, &p->restore_work,
-                               msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
-               else
+               if (signal_eviction_fence(p) ||
+                   mod_delayed_work(kfd_restore_wq, &p->restore_work,
+                                    msecs_to_jiffies(PROCESS_RESTORE_TIME_MS)))
                 pr_debug("Finished evicting pasid 0x%x\n", p->pasid);

