On 2024-10-04 10:39, Russell, Kent wrote: > [Public] > > My mistake. > >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Russell, >> Kent >> Sent: Friday, October 4, 2024 9:53 AM >> To: Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; Kuehling, Felix >> <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >> Cc: Yang, Philip <Philip.Yang@xxxxxxx> >> Subject: RE: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has >> been removed >> >> [AMD Official Use Only - AMD Internal Distribution Only] >> >> [AMD Official Use Only - AMD Internal Distribution Only] >> >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Chen, >>> Xiaogang >>> Sent: Thursday, October 3, 2024 6:11 PM >>> To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Cc: Yang, Philip <Philip.Yang@xxxxxxx> >>> Subject: Re: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has >>> been removed >>> >>> >>> On 10/3/2024 4:11 PM, Felix Kuehling wrote: >>>> >>>> On 2024-10-03 16:55, Xiaogang.Chen wrote: >>>>> From: Xiaogang Chen <xiaogang.chen@xxxxxxx> >>>>> >>>>> When kfd process has been terminated not restore userptr buffer after >>>>> mmu notifier invalidates a range. >>>>> >>>>> Signed-off-by: Xiaogang Chen<Xiaogang.Chen@xxxxxxx> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++---- >>>>> 1 file changed, 8 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> index ce5ca304dba9..1df0926b63b3 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> @@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct >>>>> mmu_interval_notifier *mni, >>>>> /* First eviction, stop the queues */ >>>>> r = kgd2kfd_quiesce_mm(mni->mm, >>>>> KFD_QUEUE_EVICTION_TRIGGER_USERPTR); >>>>> - if (r) >>>>> + >>>>> + if (r && r != -ESRCH) >>>>> pr_err("Failed to quiesce KFD\n"); >>>>> - queue_delayed_work(system_freezable_wq, >>>>> - &process_info->restore_userptr_work, >>>>> - msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); >>>>> + >>>>> + if (!r || r != -ESRCH) { >>>> >>>> This condition is always true. >>>> >>> so sure why this condition is always true? kgd2kfd_quiesce_mm can >>> return -ESRCH when it cannot find kfd process correspondent to mni->mm, >>> then above checking will be false, then will not queue restore work item >>> into system_freezable_wq. >> >> If you expand the 2 conditions, it becomes "if (r !=0 || r != -3)", which will always be >> true for any value of r. >> > I got this wrong. So it's either r==0 or r==-3 (I need some caffeine). The function returns things back up from evict_queues, mqd_destroy, and can eventually return EIO or ETIME in the hqd_destroy function, so r can indeed be different values than 0/-3. Sorry for my confusion here. It's (r == 0 || r != -3). So yeah, I was wrong, it's not always true. But it's still confusing because (r == 0) is redundant. I guess you could just write if (r != -ESRCH) { ... } I don't know if that matches your intention. Regards, Felix > > Kent > >> Kent >> >>> >>> Regards >>> >>> Xiaogang >>> >>>> Regards, >>>> Felix >>>> >>>> >>>>> + queue_delayed_work(system_freezable_wq, >>>>> + &process_info->restore_userptr_work, >>>>> + msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS)); >>>>> + } >>>>> } >>>>> mutex_unlock(&process_info->notifier_lock);