On 10/9/2024 4:36 PM, Felix Kuehling wrote:
On 2024-10-04 11:54, Xiaogang.Chen wrote:
From: Xiaogang Chen <xiaogang.chen@xxxxxxx>
When kfd process has been terminated not restore userptr buffer after
mmu
notifier invalidates a range.
Is this fixing a real problem or a hypothetical problem? If there is a
real problem, can you include more information here? It looks to me
that amdgpu_amdkfd_restore_userptr_worker is already handling the
cases where a process or mm_struct no longer exists. Maybe the only
user visible change from this patch is, that you no longer print
"Failed to quiesce KFD" in a corner case of an MMU notifier for a
process that no longer exists?
Or is there a problem with the lifetime of the process_info that
contains the work_struct?
My thought was that restore userptr buffer is not needed anyway if the
correspondent kfd process has been removed. I noticed that during
another work. I do not see if still restoring userptr buffer would cause
issue when the kfd process not exist. Think avoid doing it is safer, and
not see there is regression.
Regards
Xiaogang
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index ce5ca304dba9..6b4be7893dfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct
mmu_interval_notifier *mni,
/* First eviction, stop the queues */
r = kgd2kfd_quiesce_mm(mni->mm,
KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
- if (r)
+
+ if (r && r != -ESRCH)
pr_err("Failed to quiesce KFD\n");
- queue_delayed_work(system_freezable_wq,
- &process_info->restore_userptr_work,
- msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+
+ if (r != -ESRCH) {
Braces are not required because there is only one statement inside the
if.
Regards,
Felix
+ queue_delayed_work(system_freezable_wq,
+ &process_info->restore_userptr_work,
+ msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+ }
}
mutex_unlock(&process_info->notifier_lock);