Re: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has been removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 10/4/2024 9:46 AM, Felix Kuehling wrote:

On 2024-10-04 10:39, Russell, Kent wrote:
[Public]

My mistake.

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Russell,
Kent
Sent: Friday, October 4, 2024 9:53 AM
To: Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; Kuehling, Felix
<Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Yang, Philip <Philip.Yang@xxxxxxx>
Subject: RE: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has
been removed

[AMD Official Use Only - AMD Internal Distribution Only]

[AMD Official Use Only - AMD Internal Distribution Only]

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Chen,
Xiaogang
Sent: Thursday, October 3, 2024 6:11 PM
To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Yang, Philip <Philip.Yang@xxxxxxx>
Subject: Re: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has
been removed


On 10/3/2024 4:11 PM, Felix Kuehling wrote:
On 2024-10-03 16:55, Xiaogang.Chen wrote:
From: Xiaogang Chen <xiaogang.chen@xxxxxxx>

When kfd process has been terminated not restore userptr buffer after
mmu notifier invalidates a range.

Signed-off-by: Xiaogang Chen<Xiaogang.Chen@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
   1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index ce5ca304dba9..1df0926b63b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct
mmu_interval_notifier *mni,
           /* First eviction, stop the queues */
           r = kgd2kfd_quiesce_mm(mni->mm,
                          KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
-        if (r)
+
+        if (r && r != -ESRCH)
               pr_err("Failed to quiesce KFD\n");
-        queue_delayed_work(system_freezable_wq,
-            &process_info->restore_userptr_work,
-            msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+
+        if (!r || r != -ESRCH) {
This condition is always true.

so sure why this condition is always true?  kgd2kfd_quiesce_mm can
return -ESRCH when it cannot find kfd process correspondent to mni->mm,
then above checking will be false, then will not queue restore work item
into system_freezable_wq.
If you expand the 2 conditions, it becomes "if (r !=0 || r != -3)", which will always be
true for any value of r.

I got this wrong. So it's either r==0 or r==-3 (I need some caffeine). The function returns things back up from evict_queues, mqd_destroy, and can eventually return EIO or ETIME in the hqd_destroy function, so r can indeed be different values than 0/-3. Sorry for my confusion here.
It's (r == 0 || r != -3). So yeah, I was wrong, it's not always true. But it's still confusing because (r == 0) is redundant. I guess you could just write

	if (r != -ESRCH) {
		...
	}

I don't know if that matches your intention.

Yes, the condition check confused people though not wrong. My thought was: when there is no error, or the error is not -3, do userptr restore. So only when error is -3, not do userptr restore.

I will send new one to simply it.


Thanks

Xiaogang


Regards,
   Felix

  Kent

  Kent

Regards

Xiaogang

Regards,
   Felix


+ queue_delayed_work(system_freezable_wq,
+                &process_info->restore_userptr_work,
+ msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+        }
       }
       mutex_unlock(&process_info->notifier_lock);



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux