RE: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has been removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]

My mistake.

> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Russell,
> Kent
> Sent: Friday, October 4, 2024 9:53 AM
> To: Chen, Xiaogang <Xiaogang.Chen@xxxxxxx>; Kuehling, Felix
> <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Yang, Philip <Philip.Yang@xxxxxxx>
> Subject: RE: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has
> been removed
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Chen,
> > Xiaogang
> > Sent: Thursday, October 3, 2024 6:11 PM
> > To: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> > Cc: Yang, Philip <Philip.Yang@xxxxxxx>
> > Subject: Re: [PATCH v2] drm/amdkfd: not restore userptr buffer if kfd process has
> > been removed
> >
> >
> > On 10/3/2024 4:11 PM, Felix Kuehling wrote:
> > >
> > > On 2024-10-03 16:55, Xiaogang.Chen wrote:
> > >> From: Xiaogang Chen <xiaogang.chen@xxxxxxx>
> > >>
> > >> When kfd process has been terminated not restore userptr buffer after
> > >> mmu notifier invalidates a range.
> > >>
> > >> Signed-off-by: Xiaogang Chen<Xiaogang.Chen@xxxxxxx>
> > >> ---
> > >>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 12 ++++++++----
> > >>   1 file changed, 8 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > >> index ce5ca304dba9..1df0926b63b3 100644
> > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> > >> @@ -2524,11 +2524,15 @@ int amdgpu_amdkfd_evict_userptr(struct
> > >> mmu_interval_notifier *mni,
> > >>           /* First eviction, stop the queues */
> > >>           r = kgd2kfd_quiesce_mm(mni->mm,
> > >>                          KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
> > >> -        if (r)
> > >> +
> > >> +        if (r && r != -ESRCH)
> > >>               pr_err("Failed to quiesce KFD\n");
> > >> -        queue_delayed_work(system_freezable_wq,
> > >> -            &process_info->restore_userptr_work,
> > >> -            msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
> > >> +
> > >> +        if (!r || r != -ESRCH) {
> > >
> > > This condition is always true.
> > >
> > so sure why this condition is always true?  kgd2kfd_quiesce_mm can
> > return -ESRCH when it cannot find kfd process correspondent to mni->mm,
> > then above checking will be false, then will not queue restore work item
> > into system_freezable_wq.
>
> If you expand the 2 conditions, it becomes "if (r !=0 || r != -3)", which will always be
> true for any value of r.
>
I got this wrong. So it's either r==0 or r==-3 (I need some caffeine). The function returns things back up from evict_queues, mqd_destroy, and can eventually return EIO or ETIME in the hqd_destroy function, so r can indeed be different values than 0/-3. Sorry for my confusion here.

 Kent

>  Kent
>
> >
> > Regards
> >
> > Xiaogang
> >
> > > Regards,
> > >   Felix
> > >
> > >
> > >> + queue_delayed_work(system_freezable_wq,
> > >> +                &process_info->restore_userptr_work,
> > >> + msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
> > >> +        }
> > >>       }
> > >>       mutex_unlock(&process_info->notifier_lock);




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux