[Public] My mistake. > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Russell, > Kent > Sent: Friday, October 4, 2024 9:53 AM > To: Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; Kuehling, Felix > <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Yang, Philip <Philip.Yang@xxxxxxx> > Subject: RE: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has > been removed > > [AMD Official Use Only - AMD Internal Distribution Only] > > [AMD Official Use Only - AMD Internal Distribution Only] > > > -----Original Message----- > > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Chen, > > Xiaogang > > Sent: Thursday, October 3, 2024 6:11 PM > > To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > Cc: Yang, Philip <Philip.Yang@xxxxxxx> > > Subject: Re: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has > > been removed > > > > > > On 10/3/2024 4:11 PM, Felix Kuehling wrote: > > > > > > On 2024-10-03 16:55, Xiaogang.Chen wrote: > > >> From: Xiaogang Chen <xiaogang.chen@xxxxxxx> > > >> > > >> When kfd process has been terminated not restore userptr buffer after > > >> mmu notifier invalidates a range. > > >> > > >> Signed-off-by: Xiaogang Chen<Xiaogang.Chen@xxxxxxx> > > >> --- > > >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++---- > > >> 1 file changed, 8 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > >> index ce5ca304dba9..1df0926b63b3 100644 > > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > >> @@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct > > >> mmu_interval_notifier *mni, > > >> /* First eviction, stop the queues */ > > >> r = kgd2kfd_quiesce_mm(mni->mm, > > >> KFD_QUEUE_EVICTION_TRIGGER_USERPTR); > > >> - if (r) > > >> + > > >> + if (r && r != -ESRCH) > > >> pr_err("Failed to quiesce KFD\n"); > > >> - queue_delayed_work(system_freezable_wq, > > >> - &process_info->restore_userptr_work, > > >> - msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); > > >> + > > >> + if (!r || r != -ESRCH) { > > > > > > This condition is always true. > > > > > so sure why this condition is always true? kgd2kfd_quiesce_mm can > > return -ESRCH when it cannot find kfd process correspondent to mni->mm, > > then above checking will be false, then will not queue restore work item > > into system_freezable_wq. > > If you expand the 2 conditions, it becomes "if (r !=0 || r != -3)", which will always be > true for any value of r. > I got this wrong. So it's either r==0 or r==-3 (I need some caffeine). The function returns things back up from evict_queues, mqd_destroy, and can eventually return EIO or ETIME in the hqd_destroy function, so r can indeed be different values than 0/-3. Sorry for my confusion here. Kent > Kent > > > > > Regards > > > > Xiaogang > > > > > Regards, > > > Felix > > > > > > > > >> + queue_delayed_work(system_freezable_wq, > > >> + &process_info->restore_userptr_work, > > >> + msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); > > >> + } > > >> } > > >> mutex_unlock(&process_info->notifier_lock);