On 2024-10-09 18:50, Chen, Xiaogang wrote:
On 10/9/2024 4:36 PM, Felix Kuehling wrote:
On 2024-10-04 11:54, Xiaogang.Chen wrote:
From: Xiaogang Chen <xiaogang.chen@xxxxxxx>
When kfd process has been terminated not restore userptr buffer
after mmu
notifier invalidates a range.
Is this fixing a real problem or a hypothetical problem? If there is
a real problem, can you include more information here? It looks to me
that amdgpu_amdkfd_restore_userptr_worker is already handling the
cases where a process or mm_struct no longer exists. Maybe the only
user visible change from this patch is, that you no longer print
"Failed to quiesce KFD" in a corner case of an MMU notifier for a
process that no longer exists?
Or is there a problem with the lifetime of the process_info that
contains the work_struct?
My thought was that restore userptr buffer is not needed anyway if the
correspondent kfd process has been removed. I noticed that during
another work. I do not see if still restoring userptr buffer would
cause issue when the kfd process not exist. Think avoid doing it is
safer, and not see there is regression.
Well, safer is usually not to change things that aren't broken. Anyway,
this seems reasonable enough. With the braces below removed, the patch is
Reviewed-by: Felix Kuehling <felix.kuehling@xxxxxxx>
Regards
Xiaogang
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index ce5ca304dba9..6b4be7893dfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct
mmu_interval_notifier *mni,
/* First eviction, stop the queues */
r = kgd2kfd_quiesce_mm(mni->mm,
KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
- if (r)
+
+ if (r && r != -ESRCH)
pr_err("Failed to quiesce KFD\n");
- queue_delayed_work(system_freezable_wq,
- &process_info->restore_userptr_work,
- msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+
+ if (r != -ESRCH) {
Braces are not required because there is only one statement inside
the if.
Regards,
Felix
+ queue_delayed_work(system_freezable_wq,
+ &process_info->restore_userptr_work,
+ msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+ }
}
mutex_unlock(&process_info->notifier_lock);